InnoDB搜索索引的Stopwords
InnoDB的默认禁止词列表相对较短,因为来自技术、文学和其他来源的文档经常使用短词作为关键字或重要短语。例如,你可能搜索“是”或“不是”,并期望得到一个合理的结果,而不是让所有这些词都被忽略
InnoDB默认的stopword列表可以通过查询INFORMATION_SCHEMA查看。INNODB_FT_DEFAULT_STOPWORD表。
mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD; +-------+ | value | +-------+ | a | | about | | an | | are | | as | | at | | be | | by | | com | | de | | en | | for | | from | | how | | i | | in | | is | | it | | la | | of | | on | | or | | that | | the | | this | | to | | was | | what | | when | | where | | who | | will | | with | | und | | the | | www | +-------+ 36 rows in set (0.01 sec)
为了对所有InnoDB表定义了一个自定义的stopword列表,那么使用与innodb_ft_default_stopword表相同的结构来定义你自定义的stopword表,然后向表中插入stopwords,并且在创建全文索引之前以db_name/table_name的形式设置innodb_ft_server_stopword_table选项的值。自定义的stopword表必须有一个varchar类型的value列。下面的例子演示了如何为innodb创建一个新的全局stopword表。
mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB; Query OK, 0 rows affected (0.21 sec) mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael'); Query OK, 1 row affected (0.12 sec) mysql> CREATE TABLE opening_lines ( -> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, -> opening_line TEXT(500), -> author VARCHAR(200), -> title VARCHAR(200) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.14 sec) mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES -> ('Call me Ishmael.','Herman Melville','Moby-Dick'), -> ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'), -> ('I am an invisible man.','Ralph Ellison','Invisible Man'), -> ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'), -> ('It was love at first sight.','Joseph Heller','Catch-22'), -> ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'), -> ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'), -> ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451'); Query OK, 8 rows affected (0.04 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> SET GLOBAL innodb_ft_server_stopword_table = 'mysql/my_stopwords'; Query OK, 0 rows affected (0.00 sec) mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line); Query OK, 0 rows affected, 1 warning (0.91 sec) Records: 0 Duplicates: 0 Warnings: 1
默认情况下,长度小于3个字符或大于84个字符的单词不会出现在InnoDB全文搜索索引中。最大和最小字长值可以通过innodb_ft_max_token_size和innodb_ft_min_token_size变量进行配置。这种默认行为不适用于ngram解析器插件。ngram令牌大小由
ngram_token_size选项定义。
通过查询INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE中的单词,验证指定的stopword (‘Ishmael’)没有出现。
mysql> SET GLOBAL innodb_ft_aux_table='mysql/opening_lines'; Query OK, 0 rows affected (0.01 sec) mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE; +-----------+ | word | +-----------+ | across | | all | | burn | | buy | | call | | comes | | dalloway | | first | | flowers | | happened | | herself | | invisible | | less | | love | | man | | more | | mrs | | now | | now | | now | | pleasure | | said | | screaming | | she | | sight | | sky | | the | | the | | this | | was | | was | | when | | where | | who | | would | +-----------+ 35 rows in set (0.00 sec)
要在一个表一个表的基础上创建stopword列表,需要创建其他stopword表,并在创建全文索引之前使用innodb_ft_user_stopword_table选项指定想要使用的stopword表。