停用詞列表會載入並使用伺服器字元集和校對 (系統變數 character_set_server
和 collation_server
的值) 搜尋全文檢索查詢。如果停用詞檔案或用於全文檢索編製索引或搜尋的資料行具有與 character_set_server
或 collation_server
不同的字元集或校對,則可能會發生停用詞查閱的錯誤命中或遺漏。
停用詞查閱的大小寫敏感度取決於伺服器校對。例如,如果校對為 utf8mb4_0900_ai_ci
,則查閱不區分大小寫,而如果校對為 utf8mb4_0900_as_cs
或 utf8mb4_bin
,則查閱區分大小寫。
InnoDB
具有相對較短的預設停用詞列表,因為技術、文學和其他來源的文件經常使用短字作為關鍵字或重要詞組。例如,您可能會搜尋「“to be or not to be”」,並期望獲得合理的結果,而不是忽略所有這些字。
若要檢視預設的 InnoDB
停用詞列表,請查詢資訊結構描述 INNODB_FT_DEFAULT_STOPWORD
表格。
mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a |
| about |
| an |
| are |
| as |
| at |
| be |
| by |
| com |
| de |
| en |
| for |
| from |
| how |
| i |
| in |
| is |
| it |
| la |
| of |
| on |
| or |
| that |
| the |
| this |
| to |
| was |
| what |
| when |
| where |
| who |
| will |
| with |
| und |
| the |
| www |
+-------+
36 rows in set (0.00 sec)
若要為所有 InnoDB
表格定義您自己的停用詞列表,請定義與 INNODB_FT_DEFAULT_STOPWORD
表格結構相同的表格,使用停用詞填入它,並在建立全文檢索索引之前,將 innodb_ft_server_stopword_table
選項的值設定為
形式的值。停用詞表格必須具有名為 db_name
/table_name
value
的單一 VARCHAR
資料行。以下範例示範如何建立和設定新的全域 InnoDB
停用詞表格。
-- Create a new stopword table
mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
Query OK, 0 rows affected (0.01 sec)
-- Insert stopwords (for simplicity, a single stopword is used in this example)
mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael');
Query OK, 1 row affected (0.00 sec)
-- Create the table
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)
-- Insert data into the table
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
-- Set the innodb_ft_server_stopword_table option to the new stopword table
mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
Query OK, 0 rows affected (0.00 sec)
-- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (1.17 sec)
Records: 0 Duplicates: 0 Warnings: 1
透過查詢資訊結構描述 INNODB_FT_INDEX_TABLE
表格,確認指定的停用詞 ('Ishmael') 是否未出現。
依預設,長度小於 3 個字元或大於 84 個字元的字詞不會出現在 InnoDB
全文檢索索引中。可以使用 innodb_ft_max_token_size
和 innodb_ft_min_token_size
變數設定最大和最小字詞長度值。此預設行為不適用於 ngram 剖析器外掛程式。ngram 權杖大小由 ngram_token_size
選項定義。
mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15;
+-----------+
| word |
+-----------+
| across |
| all |
| burn |
| buy |
| call |
| comes |
| dalloway |
| first |
| flowers |
| happened |
| herself |
| invisible |
| less |
| love |
| man |
+-----------+
15 rows in set (0.00 sec)
若要逐個表格建立停用詞列表,請建立其他停用詞表格,並使用 innodb_ft_user_stopword_table
選項指定您要在建立全文檢索索引之前使用的停用詞表格。
如果 character_set_server
為 ucs2
、utf16
、utf16le
或 utf32
,則停用詞檔案會使用 latin1
載入和搜尋。
若要覆寫 MyISAM 資料表的預設停用字詞列表,請設定 ft_stopword_file
系統變數。(請參閱第 7.1.8 節,「伺服器系統變數」。)變數值應該是包含停用字詞列表的檔案路徑名稱,或是空字串以停用停用字詞篩選。伺服器會在資料目錄中尋找該檔案,除非給定絕對路徑名稱以指定不同的目錄。在變更此變數的值或停用字詞檔案的內容後,請重新啟動伺服器並重建您的 FULLTEXT
索引。
停用字詞列表是自由格式的,使用任何非字母數字字元(例如換行符號、空格或逗號)分隔停用字詞。例外情況是底線字元 (_
) 和單引號 ('
),它們會被視為單字的一部分。停用字詞列表的字元集是伺服器的預設字元集;請參閱第 12.3.2 節,「伺服器字元集和定序」。
以下列表顯示 MyISAM
搜尋索引的預設停用字詞。在 MySQL 原始碼發行版中,您可以在 storage/myisam/ft_static.c
檔案中找到此列表。
a's able about above according
accordingly across actually after afterwards
again against ain't all allow
allows almost alone along already
also although always am among
amongst an and another any
anybody anyhow anyone anything anyway
anyways anywhere apart appear appreciate
appropriate are aren't around as
aside ask asking associated at
available away awfully be became
because become becomes becoming been
before beforehand behind being believe
below beside besides best better
between beyond both brief but
by c'mon c's came can
can't cannot cant cause causes
certain certainly changes clearly co
com come comes concerning consequently
consider considering contain containing contains
corresponding could couldn't course currently
definitely described despite did didn't
different do does doesn't doing
don't done down downwards during
each edu eg eight either
else elsewhere enough entirely especially
et etc even ever every
everybody everyone everything everywhere ex
exactly example except far few
fifth first five followed following
follows for former formerly forth
four from further furthermore get
gets getting given gives go
goes going gone got gotten
greetings had hadn't happens hardly
has hasn't have haven't having
he he's hello help hence
her here here's hereafter hereby
herein hereupon hers herself hi
him himself his hither hopefully
how howbeit however i'd i'll
i'm i've ie if ignored
immediate in inasmuch inc indeed
indicate indicated indicates inner insofar
instead into inward is isn't
it it'd it'll it's its
itself just keep keeps kept
know known knows last lately
later latter latterly least less
lest let let's like liked
likely little look looking looks
ltd mainly many may maybe
me mean meanwhile merely might
more moreover most mostly much
must my myself name namely
nd near nearly necessary need
needs neither never nevertheless new
next nine no nobody non
none noone nor normally not
nothing novel now nowhere obviously
of off often oh ok
okay old on once one
ones only onto or other
others otherwise ought our ours
ourselves out outside over overall
own particular particularly per perhaps
placed please plus possible presumably
probably provides que quite qv
rather rd re really reasonably
regarding regardless regards relatively respectively
right said same saw say
saying says second secondly see
seeing seem seemed seeming seems
seen self selves sensible sent
serious seriously seven several shall
she should shouldn't since six
so some somebody somehow someone
something sometime sometimes somewhat somewhere
soon sorry specified specify specifying
still sub such sup sure
t's take taken tell tends
th than thank thanks thanx
that that's thats the their
theirs them themselves then thence
there there's thereafter thereby therefore
therein theres thereupon these they
they'd they'll they're they've think
third this thorough thoroughly those
though three through throughout thru
thus to together too took
toward towards tried tries truly
try trying twice two un
under unfortunately unless unlikely until
unto up upon us use
used useful uses using usually
value various very via viz
vs want wants was wasn't
way we we'd we'll we're
we've welcome well went were
weren't what what's whatever when
whence whenever where where's whereafter
whereas whereby wherein whereupon wherever
whether which while whither who
who's whoever whole whom whose
why will willing wish with
within without won't wonder would
wouldn't yes yet you you'd
you'll you're you've your yours
yourself yourselves zero