文件首頁
MySQL 8.4 參考手冊
相關文件 下載本手冊
PDF (US Ltr) - 39.9Mb
PDF (A4) - 40.0Mb
Man Pages (TGZ) - 258.5Kb
Man Pages (Zip) - 365.5Kb
Info (Gzip) - 4.0Mb
Info (Zip) - 4.0Mb


MySQL 8.4 參考手冊  /  ...  /  全文檢索停用字

14.9.4 全文檢索停用字

停用字清單會使用伺服器字元集和校對 (系統變數 character_set_servercollation_server 的值) 載入並搜尋全文檢索查詢。如果停用字檔案或用於全文索引或搜尋的資料行,其字元集或校對與 character_set_servercollation_server 不同,則停用字查詢可能會發生錯誤的命中或錯過。

停用字查詢的大小寫敏感度取決於伺服器校對。例如,如果校對為 utf8mb4_0900_ai_ci,則查詢不區分大小寫,而如果校對為 utf8mb4_0900_as_csutf8mb4_bin,則查詢區分大小寫。

InnoDB 搜尋索引的停用字

InnoDB 的預設停用字清單相對較短,因為來自技術、文學和其他來源的文件通常會使用短字作為關鍵字或在重要詞組中使用。例如,您可能會搜尋 to be or not to be 並期望獲得合理的結果,而不是讓所有這些字詞都被忽略。

若要查看預設的 InnoDB 停用字清單,請查詢 Information Schema INNODB_FT_DEFAULT_STOPWORD 表格。

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a     |
| about |
| an    |
| are   |
| as    |
| at    |
| be    |
| by    |
| com   |
| de    |
| en    |
| for   |
| from  |
| how   |
| i     |
| in    |
| is    |
| it    |
| la    |
| of    |
| on    |
| or    |
| that  |
| the   |
| this  |
| to    |
| was   |
| what  |
| when  |
| where |
| who   |
| will  |
| with  |
| und   |
| the   |
| www   |
+-------+
36 rows in set (0.00 sec)

若要為所有 InnoDB 表格定義您自己的停用字清單,請定義一個與 INNODB_FT_DEFAULT_STOPWORD 表格結構相同的表格,並以停用字填入,然後在建立全文檢索索引之前,將 innodb_ft_server_stopword_table 選項的值設定為 db_name/table_name 形式的值。停用字表格必須有一個名為 value 的單一 VARCHAR 資料行。以下範例示範如何為 InnoDB 建立和設定新的全域停用字表格。

-- Create a new stopword table

mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
Query OK, 0 rows affected (0.01 sec)

-- Insert stopwords (for simplicity, a single stopword is used in this example)

mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael');
Query OK, 1 row affected (0.00 sec)

-- Create the table

mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

-- Insert data into the table

mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

-- Set the innodb_ft_server_stopword_table option to the new stopword table

mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
Query OK, 0 rows affected (0.00 sec)

-- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)

mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (1.17 sec)
Records: 0  Duplicates: 0  Warnings: 1

透過查詢 Information Schema INNODB_FT_INDEX_TABLE 表格,驗證指定的停用字 ('Ishmael') 是否未出現。

注意

依預設,長度小於 3 個字元或大於 84 個字元的字詞不會出現在 InnoDB 全文檢索索引中。可以使用 innodb_ft_max_token_sizeinnodb_ft_min_token_size 變數設定最大和最小字詞長度值。此預設行為不適用於 ngram 解析器外掛程式。ngram 符記大小由 ngram_token_size 選項定義。

mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15;
+-----------+
| word      |
+-----------+
| across    |
| all       |
| burn      |
| buy       |
| call      |
| comes     |
| dalloway  |
| first     |
| flowers   |
| happened  |
| herself   |
| invisible |
| less      |
| love      |
| man       |
+-----------+
15 rows in set (0.00 sec)

若要在逐個表格的基礎上建立停用字清單,請建立其他停用字表格,並使用 innodb_ft_user_stopword_table 選項指定您想要在建立全文檢索索引之前使用的停用字表格。

MyISAM 搜尋索引的停用字

如果 character_set_serverucs2utf16utf16leutf32,則停用字檔案會使用 latin1 載入和搜尋。

若要覆寫 MyISAM 表格的預設停用字清單,請設定 ft_stopword_file 系統變數。(請參閱第 7.1.8 節「伺服器系統變數」。)變數值應該是包含停用字清單的檔案路徑名稱,或是停用停用字篩選的空字串。伺服器會在資料目錄中尋找檔案,除非指定不同的目錄時提供絕對路徑名稱。在變更此變數的值或停用字檔案的內容之後,請重新啟動伺服器並重建您的 FULLTEXT 索引。

停用字清單是自由格式的,會使用任何非字母數字字元 (例如換行符、空格或逗號) 分隔停用字。例外情況是底線字元 (_) 和單引號 ('),它們會被視為字詞的一部分。停用字清單的字元集是伺服器的預設字元集;請參閱第 12.3.2 節「伺服器字元集和校對」

以下清單顯示 MyISAM 搜尋索引的預設停用字。在 MySQL 原始碼散發中,您可以在 storage/myisam/ft_static.c 檔案中找到此清單。

a's           able          about         above         according
accordingly   across        actually      after         afterwards
again         against       ain't         all           allow
allows        almost        alone         along         already
also          although      always        am            among
amongst       an            and           another       any
anybody       anyhow        anyone        anything      anyway
anyways       anywhere      apart         appear        appreciate
appropriate   are           aren't        around        as
aside         ask           asking        associated    at
available     away          awfully       be            became
because       become        becomes       becoming      been
before        beforehand    behind        being         believe
below         beside        besides       best          better
between       beyond        both          brief         but
by            c'mon         c's           came          can
can't         cannot        cant          cause         causes
certain       certainly     changes       clearly       co
com           come          comes         concerning    consequently
consider      considering   contain       containing    contains
corresponding could         couldn't      course        currently
definitely    described     despite       did           didn't
different     do            does          doesn't       doing
don't         done          down          downwards     during
each          edu           eg            eight         either
else          elsewhere     enough        entirely      especially
et            etc           even          ever          every
everybody     everyone      everything    everywhere    ex
exactly       example       except        far           few
fifth         first         five          followed      following
follows       for           former        formerly      forth
four          from          further       furthermore   get
gets          getting       given         gives         go
goes          going         gone          got           gotten
greetings     had           hadn't        happens       hardly
has           hasn't        have          haven't       having
he            he's          hello         help          hence
her           here          here's        hereafter     hereby
herein        hereupon      hers          herself       hi
him           himself       his           hither        hopefully
how           howbeit       however       i'd           i'll
i'm           i've          ie            if            ignored
immediate     in            inasmuch      inc           indeed
indicate      indicated     indicates     inner         insofar
instead       into          inward        is            isn't
it            it'd          it'll         it's          its
itself        just          keep          keeps         kept
know          known         knows         last          lately
later         latter        latterly      least         less
lest          let           let's         like          liked
likely        little        look          looking       looks
ltd           mainly        many          may           maybe
me            mean          meanwhile     merely        might
more          moreover      most          mostly        much
must          my            myself        name          namely
nd            near          nearly        necessary     need
needs         neither       never         nevertheless  new
next          nine          no            nobody        non
none          noone         nor           normally      not
nothing       novel         now           nowhere       obviously
of            off           often         oh            ok
okay          old           on            once          one
ones          only          onto          or            other
others        otherwise     ought         our           ours
ourselves     out           outside       over          overall
own           particular    particularly  per           perhaps
placed        please        plus          possible      presumably
probably      provides      que           quite         qv
rather        rd            re            really        reasonably
regarding     regardless    regards       relatively    respectively
right         said          same          saw           say
saying        says          second        secondly      see
seeing        seem          seemed        seeming       seems
seen          self          selves        sensible      sent
serious       seriously     seven         several       shall
she           should        shouldn't     since         six
so            some          somebody      somehow       someone
something     sometime      sometimes     somewhat      somewhere
soon          sorry         specified     specify       specifying
still         sub           such          sup           sure
t's           take          taken         tell          tends
th            than          thank         thanks        thanx
that          that's        thats         the           their
theirs        them          themselves    then          thence
there         there's       thereafter    thereby       therefore
therein       theres        thereupon     these         they
they'd        they'll       they're       they've       think
third         this          thorough      thoroughly    those
though        three         through       throughout    thru
thus          to            together      too           took
toward        towards       tried         tries         truly
try           trying        twice         two           un
under         unfortunately unless        unlikely      until
unto          up            upon          us            use
used          useful        uses          using         usually
value         various       very          via           viz
vs            want          wants         was           wasn't
way           we            we'd          we'll         we're
we've         welcome       well          went          were
weren't       what          what's        whatever      when
whence        whenever      where         where's       whereafter
whereas       whereby       wherein       whereupon     wherever
whether       which         while         whither       who
who's         whoever       whole         whom          whose
why           will          willing       wish          with
within        without       won't         wonder        would
wouldn't      yes           yet           you           you'd
you'll        you're        you've        your          yours
yourself      yourselves    zero