MySQL 自然语言全文搜索怎么用?

文章导读
上一个 测验 下一个 在深入了解自然语言全文搜索的概念之前,让我们先理解其背景。如今,用于搜索的关键词并不总是能完全匹配用户期望的结果。因此,搜索引擎被设计为注重提高搜索相关性,以缩小搜索查询与搜索结果之间的准确性差距。因此,结果会按照与搜索关键词相关性的高低顺序显示。
📋 目录
  1. A 自然语言全文搜索
  2. B 搜索中的 Stop Words
  3. C 使用客户端程序进行自然语言全文搜索
A A

MySQL - 自然语言全文搜索

目录
  • 自然语言全文搜索
  • 搜索中的停用词
  • 使用客户端程序进行自然语言全文搜索


上一个
测验
下一个

在深入了解自然语言全文搜索的概念之前,让我们先理解其背景。如今,用于搜索的关键词并不总是能完全匹配用户期望的结果。因此,搜索引擎被设计为注重提高搜索相关性,以缩小搜索查询与搜索结果之间的准确性差距。因此,结果会按照与搜索关键词相关性的高低顺序显示。

类似地,在像 MySQL 这样的关系型数据库中,全文搜索是一种用于检索可能不完全匹配搜索关键词的结果集的技术。全文搜索使用了三种搜索模式 −

  • 自然语言模式

  • 查询扩展模式

  • 布尔模式

自然语言全文搜索

自然语言全文搜索以 IN NATURAL LANGUAGE 模式执行常规的全文搜索。在这种模式下执行全文搜索时,搜索结果会按照与关键词(本次搜索针对的关键词)的相关性顺序显示。这是全文搜索的默认模式。

由于这是全文搜索,因此必须在基于文本的列(如 CHAR、VARCHAR、TEXT 数据类型列)上应用 FULLTEXT 索引。FULLTEXT 索引是一种特殊的索引类型,用于在文本值中搜索关键词,而不是尝试将关键词与这些列值进行比较。

语法

以下是执行自然语言全文搜索的基本语法 −

SELECT * FROM table_name 
WHERE MATCH(column_name(s)) 
AGAINST ('keyword_name' IN NATURAL LANGUAGE MODE);

示例

让我们通过以下示例来理解如何在数据库表上执行自然语言全文搜索。

为此,我们将首先创建一个名为 ARTICLES 的表,其中包含文章的标题和描述。如以下所示,在文本列 article_titledescriptions 上应用了 FULLTEXT 索引 −

CREATE TABLE ARTICLES (
   ID INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
   ARTICLE_TITLE VARCHAR(100),
   DESCRIPTION TEXT,
   FULLTEXT (ARTICLE_TITLE, DESCRIPTION)
) ENGINE = InnoDB;

现在,让我们使用以下查询将文章的详细信息(如标题和 DESCRIPTION)插入到该表中 −

INSERT INTO ARTICLES (ARTICLE_TITLE, DESCRIPTION) VALUES 
('MySQL Tutorial', 'MySQL is a relational database system that uses SQL to structure data stored'),
('Java Tutorial', 'Java is an object-oriented and platform-independent programming language'),
('Hadoop Tutorial', 'Hadoop is framework that is used to process large sets of data'),
('Big Data Tutorial', 'Big Data refers to data that has wider variety of data sets in larger numbers'),
('JDBC Tutorial', 'JDBC is a Java based technology used for database connectivity');

创建的表如下所示 −

ID ARTICLE_TITLE DESCRIPTION
1 MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
2 Java Tutorial Java is an object-oriented and platform-independent programming language
3 Hadoop Tutorial Hadoop is framework that is used to process large sets of data
4 Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
5 JDBC Tutorial JDBC is a Java based technology used for database connectivity

使用全文搜索的自然语言模式,以关键词 'data set' 搜索与数据相关的文章记录。

SELECT * FROM ARTICLES 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) 
AGAINST ('data set' IN NATURAL LANGUAGE MODE);

输出

以下是输出结果 −

ID ARTICLE_TITLE DESCRIPTION
4 Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
1 MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
3 Hadoop Tutorial Hadoop is framework that is used to process large sets of data

如上所示,在表中所有文章中,获得了三个与术语 'data set' 相关的搜索结果,并按照相关性顺序排列。但请注意,关键词 'data set' 在 'MySQL Tutorial' 文章记录中并非完美匹配,但仍被检索到,因为 MySQL 也处理数据集合。

搜索中的 Stop Words

Natural Language Full-text Search 使用 tf-idf 算法,其中 'tf' 表示 term frequency(词频),'idf' 表示 inverse document frequency(逆文档频率)。搜索会考虑某个词在单个文档中的出现频率,以及该词出现在文档中的数量。然而,搜索通常会忽略一些词,比如字符数少于一定数量的词。InnoDB 忽略少于 3 个字符的词,而 MyISAM 忽略少于 4 个字符的词。此类词被称为 Stopwords(the、a、an、are 等)。

Example

在以下示例中,我们对上面创建的 ARTICLES 表执行一个简单的 Natural Language Full-text Search。让我们看看 stop words 如何影响 Full-text search,通过针对两个关键词 'Big Tutorial' 和 'is Tutorial' 执行搜索。

搜索 'Big Tutorial':

以下查询在 Natural Language Mode 下对 'Big Tutorial' 关键词执行 full-text search −

SELECT ARTICLE_TITLE, DESCRIPTION FROM ARTICLES 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)
AGAINST ('Big Tutorial' IN NATURAL LANGUAGE MODE);

输出:

输出结果如下 −

ARTICLE_TITLE DESCRIPTION
Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
Java Tutorial Java is an object-oriented and platform-independent programming language
Hadoop Tutorial Hadoop is framework that is used to process large sets of data
JDBC Tutorial JDBC is a Java based technology used for database connectivity

搜索 'is Tutorial':

以下查询在 Natural Language Mode 下对 'is Tutorial' 关键词执行 full-text search −

SELECT ARTICLE_TITLE, DESCRIPTION FROM Articles 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)
AGAINST ('is Tutorial' IN NATURAL LANGUAGE MODE);

输出:

输出结果如下 −

ARTICLE_TITLE DESCRIPTION
MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
Java Tutorial Java is an object-oriented and platform-independent programming language
Hadoop Tutorial Hadoop is framework that is used to process large sets of data
Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
JDBC Tutorial JDBC is a Java based technology used for database connectivity

如上例所示,由于 'Tutorial' 一词存在于表的所有记录中,在两种情况下都会检索到所有记录。然而,相关性排序顺序由指定的关键词的第二个词决定。

在第一种情况下,由于 'Big' 一词存在于 'Big Data Tutorial' 中,该记录被检索到首位。在第二种情况下,结果集中的记录顺序与原始表相同,因为 'is' 是 stop word,因此被忽略。

使用客户端程序进行自然语言全文搜索

我们也可以使用客户端程序对 MySQL 数据库执行自然语言全文搜索操作。

语法

PHP NodeJS Java Python

要通过 PHP 程序执行自然语言全文搜索,我们需要使用 mysqli 函数 query() 执行以下 SELECT 语句,如下所示 −

$sql = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)";
$mysqli->query($sql);

要通过 JavaScript 程序执行自然语言全文搜索,我们需要使用 mysql2 库的 query() 函数执行以下 SELECT 语句,如下所示 −

sql = `SELECT * FROM Articles  WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)`;
con.query(sql);

要通过 Java 程序执行自然语言全文搜索,我们需要使用 JDBC 函数 executeQuery() 执行 SELECT 语句,如下所示 −

String sql = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)";
statement.executeQuery(sql);

要通过 Python 程序执行自然语言全文搜索,我们需要使用 MySQL Connector/Pythonexecute() 函数执行 SELECT 语句,如下所示 −

natural_language_search_query = 'SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)'
cursorObj.execute(natural_language_search_query)

示例

以下是相应的程序 −

PHP NodeJS Java Python
$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$dbname = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbname);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } // printf('Connected successfully.
'); $s = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)"; if ($r = $mysqli->query($s)) { printf("Table Records: \n"); while ($row = $r->fetch_assoc()) { printf(" ID: %d, Title: %s, Descriptions: %s", $row["id"], $row["ARTICLE_TITLE"], $row["DESCRIPTION"]); printf("\n"); } } else { printf('Failed'); } $mysqli->close();

输出

得到的输出如下所示 −

Table Records:
ID: 4, Title: Big Data Tutorial, Descriptions: Big Data refers to data that has wider variety of data sets in larger numbers
ID: 1, Title: MySQL Tutorial, Descriptions: MySQL is a relational database system that uses SQL to structure data stored
ID: 3, Title: Hadoop Tutorial, Descriptions: Hadoop is framework that is used to process large sets of data   
var mysql = require("mysql2");
var con = mysql.createConnection({
  host: "localhost",
  user: "root",
  password: "password",
}); //Connecting to MySQL

con.connect(function (err) {
  if (err) throw err;
  //   console.log("Connected successfully...!");
  //   console.log("--------------------------");
  sql = "USE TUTORIALS";
  con.query(sql);

  //display the table details!...
  sql = `SELECT * FROM Articles  WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)`;
  con.query(sql, function (err, result) {
    if (err) throw err;
    console.log(result);
  });
});    

输出

得到的输出如下所示 −

We get the following output, after executing the above NodeJs Program.
[
  {
    id: 4,
    ARTICLE_TITLE: 'Big Data Tutorial',
    DESCRIPTION: 'Big Data refers to data that has wider variety of data sets in larger numbers'
  },
  {
    id: 1,
    ARTICLE_TITLE: 'MySQL Tutorial',
    DESCRIPTION: 'MySQL is a relational database system that uses SQL to structure data stored'
  },
  {
    id: 3,
    ARTICLE_TITLE: 'Hadoop Tutorial',
    DESCRIPTION: 'Hadoop is framework that is used to process large sets of data'
  }
]  
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class NaturalLanguageSearch {
   public static void main(String[] args) {
      String url = "jdbc:mysql://localhost:3306/TUTORIALS";
      String username = "root";
      String password = "password";
      try {
         Class.forName("com.mysql.cj.jdbc.Driver");
         Connection connection = DriverManager.getConnection(url, username, password);
         Statement statement = connection.createStatement();
         System.out.println("Connected successfully...!");

         //displaying the fulltext records in the Natural language mode:
         ResultSet resultSet = statement.executeQuery("SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, descriptions)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)");
         while (resultSet.next()){
            System.out.println(resultSet.getString(1)+" "+resultSet.getString(2)+ " "+resultSet.getString(3));
         }
         connection.close();
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}          

输出

得到的输出如下所示 −

Connected successfully...!
4 Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
1 MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
3 Hadoop Tutorial Hadoop is framework that is used to process large sets of data
import mysql.connector
# Establishing the connection
connection = mysql.connector.connect(
   host='localhost',
   user='root',
   password='password',
   database='tut'
)
# Creating a cursor object
cursorObj = connection.cursor()
natural_language_search_query = '''
SELECT * FROM Articles 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) 
AGAINST ('data set' IN NATURAL LANGUAGE MODE)
'''
cursorObj.execute(natural_language_search_query)
# Fetching all the results
results = cursorObj.fetchall()
# Display the result
print("NATURAL LANGUAGE search results:")
for row in results:
   print(row)
cursorObj.close()
connection.close()            

输出

得到的输出如下所示 −

NATURAL LANGUAGE search results:
(4, 'Big Data Tutorial', 'Big Data refers to data that has wider variety of data sets in larger numbers')
(1, 'MySQL Tutorial', 'MySQL is a relational database system that uses SQL to structure data stored')
(3, 'Hadoop Tutorial', 'Hadoop is framework that is used to process large sets of data')