User:Chen-Pan Liao/費雪正確概率檢定

費雪正確概率檢定（英文：Fisher's exact test），或稱費雪精確檢定，是統計學中的一種假說檢定，用於檢驗列聯表（英语：Contingency table）的顯著性差異，由羅納德·愛爾默·費雪於1935年所創。^[1]^[2]^[3]實務中，該方法常用於樣本數較小的情況，但其實不限於小樣本情況。它屬於一種精確檢定（英语：Exact test），也就是其p值可以由虛無假說的分布實際計算而不是藉由足夠的樣本數逼近一個特定的機率分布。

據說，費雪根據缪丽·布里斯托尔（英语：Muriel Bristol）女士聲稱能夠區別奶茶是先加了茶還是牛奶而設計了這項檢定。他在女士品茶實驗中亦實作了這項檢定。 ^[4]

目的與使用情境[编辑]

此檢定在考驗兩種分類結果所產生的類別型變數很有用；它用於檢查兩種分類結果之間的關聯（偶然性）是否顯著。在費雪的原始例題中，一個分類結果是奶茶實際上的沖泡方式（先加牛奶還是茶），另一個分類標準是缪丽·布里斯托尔（英语：Muriel Bristol）認定的沖泡方式，並使用本方法檢驗這兩種分類結果是否具有關聯（受測者是否真的可以分辨出先倒入的是牛奶還是茶）。如同女士品茶實驗，此檢定大多數使用於2 × 2列聯表（如下所述）。最終求得的p值是基於列聯表邊際是固定的，也就是受測者明確知曉八杯茶中有四杯先加牛奶，因此必然只會挑出四杯。這導致表格單元格中數字在獨立性虛無假說下服從超幾何分佈。

若樣本數較大，一般使用卡方檢定或G檢定（英语：G-test），其統計量近似於卡方分布。在樣本數較小或是表格中次數差異很大的情況，這樣的大樣本近似方法不適用。通常可以預先檢查表格中各細格的期望值是否皆大於5（或是只有一格小於10）以決定可否使用基於卡方分布的大樣本近似方法，雖然這樣的預先檢查已被認定為過度保守。^[5]事實上，卡方近似方法的p值在過小、稀疏的或不平衡的數據與精確檢定的p值可能南轅北轍而導致相反結論。^[6] ^[7]相比之下，費雪精確檢定，正如其名稱所述，只要實驗過程保持行和列總和固定不變，它就是精確的，因此無論樣本特徵如何都可以使用。費雪的方法雖然使用於大樣本或平衡良好的表格會使計算變得困難，但幸運的是，這些正是卡方檢定適合的條件。

此檢定在2 × 2列聯表的情況下可以用手計算。然而，此方法其實可以擴展到m × n聯表的情況，^[8]但計算並不容易，可改用統計軟體計算（其中有些使用蒙特卡羅方法來獲得p值的近似值）。^[9]

此檢定還可用於量化兩組之間的「重疊程度」。例如，在統計遺傳學的富集分析（英语：Gene set enrichment analysis）中，可以為特定的表型加註一組基因（A）。使用者可以測試某些感興趣的基因組（B）與基因組A的重疊程度。在這種情況下，可以歸納成一個 2 × 2 列聯表以表示以下情況的次數：

同時存在於A基因組與B基因組的基因
僅存在於A的基因
僅存在於B的基因
同時不存在於A與B的基因

該測試的虛無假設是任一基因組的基因都來自更廣泛的基因集，再以費雪正確概率檢定檢驗是否顯著重疊。^[10]

例題[编辑]

以一群青少年樣本為例，一方面可以將樣本分為男性和女性，另一方面可以分為目前正在或尚未準備統計學考試。樣本中正在準備考試的女性多於男性，而我們想檢驗我們觀察到的比例差異是否顯著。數據如下所示：

	男性	女性	列總和
正在準備考試	1	9	10
尚未準備考試	11	3	14
欄總和	12	12	24

關於這些數據，我們要問的問題是：已知這24名青少年中有10名正在準備考試，並且這24名青少年中有12名是女性。將虛無假說設定為男性和女性的學習比例是相等的，則這10名準備考試的青少年的性別分佈是否不同於尚未準備考試者？更具體的說，如果我們隨機選擇10位青少年，則能夠抽出12位女性中的9位（或更多）女性而12名男性中只抽出1位（或更少）的機率是多少？

在進行檢驗之前，我們首先介紹一些符號。我們用字母a、b、c和d表示各細格中的次數，將跨行和跨列的總計稱為邊際總計，並用n表示總和數。所以上述表格可寫成：

	男性	女性	列總和
正在準備考試	a	b	a + b
尚未準備考試	c	d	c + d
欄總和	a + c	b + d	a + b + c + d = n

費雪表明，以表格中列總和與欄總和皆被故定為條件，a呈超幾何分布，其中a + c從a+b成功和c+d失敗的母體中抽出。獲得這樣一組結果的機率由下式給出：

p={\frac {\displaystyle {{a+b} \choose {a}}\displaystyle {{c+d} \choose {c}}}{\displaystyle {{n} \choose {a+c}}}}={\frac {\displaystyle {{a+b} \choose {b}}\displaystyle {{c+d} \choose {d}}}{\displaystyle {{n} \choose {b+d}}}}={\frac {(a+b)!~(c+d)!~(a+c)!~(b+d)!}{a!~~b!~~c!~~d!~~n!}}

其中 ${\tbinom {n}{k}}$ 是二項式係數，符號「!」表示階乘運算。我們可以這樣理解：若已知所有的邊際總和（即a + b、c + d、a + c和b + d），則只剩下一個自由度，例如已知a則足以推導出其他數值。現在， ${\displaystyle p=p(a)}$ 是從包含n個元素的更大集合中抽出不放回地隨機選擇a + c個元素時抽出a元素，這正是超幾何分布的定義。由上述資料可得，

p={{\tbinom {10}{1}}{\tbinom {14}{11}}}/{\tbinom {24}{12}}={\tfrac {10!~14!~12!~12!}{1!~9!~11!~3!~24!}}\approx 0.001346076

上面的公式給出了觀察這種特定數據排列的確切超幾何機率，其前題是男性和女性具有相同比例進行考試準備比例的虛無假說以及邊際總數為定值。換句話說，如果我們假設男性與女性準備考試的機率都是p，並且男性和女性都是獨立地被採樣，無論他們是否正在準考試，那麼這個超幾何公式給出了在四個單元格中觀察次數a、b、c、d的條件機率，而條件是觀察到的邊緣總數（也就是假設給出了表格邊緣顯示的列與欄總數）。即使男性與女性以不同的機率進入我們的樣本（例如母體中性別比例不是1:1），這仍然是正確的。要求僅僅是兩個分類特徵（性別和是否準備考慮）互為獨立事件。例如，假設我們知道機率P和Q分別表示男性與女性的邊際比例，p與q分別表示有無準備考試的邊際比例，自然存在P + Q = 1與p + q = 1的事實，且性別和是否準備考慮）互為獨立事件，則上述資料各性別與是否準備考試的機率則分別為

已準備考試的男性機率：PQ
已準備考試的女性機率：pQ
未準備考試的男性機率：Pq
未準備考試的女性機率：pq

之後，如果我們計算給定邊緣條件的分佈，我們將獲得上述的公式，其中p和P都不在式中。因此，我們可以計算出將24名青少年任意排列到表的四個單元格中的確切機率。費雪表明，統計顯著性的計算只需要考慮邊際總和與觀測結果相同或更極端的情況即可。（巴納德檢定（英语：Barnard's test）則放寬了對一組邊際總數的限制。）在該示例中，有11種排列方式與上述數據在相同的方向上更為極端，並可以簡化為1種組合（如下表）：

	男性	女性	列總和
正在準備考試	0	10	10
尚未準備考試	12	2	14
欄總和	12	12	24

而發生這組資料的機率（在相同前題下）為 ${p={\tbinom {10}{0}}{\tbinom {14}{12}}}/{\tbinom {24}{12}}\approx 0.000033652$

若虛無假說為真，我們可以得到單尾檢定（英语：One- and two-tailed tests）的p值，即目前資料及更極端的資料的機率總和，約等於0.001346076 + 0.000033652 = 0.001379728。在R語言環境下，這個值可以藉由fisher.test(rbind(c(1,9),c(11,3)),alternative="less")$p.value，或者在Python中使用scipy.stats.fisher_exact(table=[[1,9],[11,3]], alternative="less")獲取。該p值可以解釋為觀察數據（或任何更極端的表格）為虛無假說（男性和女性準備考試的比例沒有差異）提供的證據總和。當p值越小，拒絕原假設的證據越多；因此例題中的數據強烈地表明男性和女性準備考試的可能性並不相同。

若考慮的是雙尾檢定（英语：One- and two-tailed tests），則需要額外考慮同樣極端但方向相反的表格，即對稱於目前資料方向的拒絕域。然而，此時「對稱處更極端的表格」並沒有唯一的定義。R語言提供的fisher.test函數採用的方法是對所有機率小於或等於目前資料概率的總和來計算p值，因此雙尾檢定的p值不一定是單尾檢定的二倍（特別是小樣本的情況），與其它具有對稱性的機率分布不同。

如上所述，太多數現代統計軟體（英语：List of statistical software）可以計算費雪精確檢定的顯著性，但當樣本數很大時可能會無法運算，例如發生過大的階乘而中斷。此時可改以卡方分布的近似方法，或是利用Γ函數或對數Γ函數，但精確計算超幾何和二項式概率的方法仍然是熱門的研究領域。

爭議[编辑]

儘管費雪的檢定方法能精確地計算p值，但一些作者認為它是保守的，也就是檢定力較低。^[11]^[12]^[13]當離散統計量的特性與選用固定的顯著性水準二者結合後可能發生這樣的問題。^[14]^[15]更準確地說，費雪檢定加總了在虛無假說成立時每種相同或更極端的表格之發生機率為p值，但由於所有表格的集合是離散的，可能不存在與實現情況相等的表格。若α_e是小於5%的最大p值並存在於某些表格的集合，建議應預先測試有效的α_e水準。對於小樣本量的清況，α_e可能明顯低於5%。^[11]^[12]^[13]雖然這種影響發生在任何離散統計數據中，但有人認為這一事實使費雪在邊際上的檢驗條件使問題更加複雜。^[16]為了避免這個問題，許多作者在處理離散問題時不鼓勵使用固定的顯著性水準。^[14]^[15]

以表格邊緣為條件的決定也存在爭議。^[17]^[18]費雪檢定得出的p值來自以列邊際總和與欄邊際總和被固定。從這個意義上講，測試僅對條件分佈是精確的，而不是原始表格。在原始資料中，邊際總數可能因實驗而異而不適合使用費雪檢定。當邊際總和不固定時，可以考慮使用其他方法以獲得2 × 2表格的精確p值。例如，巴納德檢定（英语：Barnard's test）允許隨機的邊際總和。然而，一些作者（包括後來的巴納德本人）批評了巴納德基於此性質的檢驗。^[14]^[15]^[18]^[14]他們認為邊際成功總數（即前先表格中的a + b）幾乎是輔助統計量（英语：ancillary statistic），^[15]幾乎不包含有關測試屬性的信息。

從2 × 2表格中以邊際成功率為條件可能忽略了數據中關於未知勝算比（英语：Odds ratio）的一些信息。^[19]邊際總數（幾乎）是輔助統計量的論點意味著，用於推斷這個勝算比的適當似然函數應該以邊際成功率為條件。^[19]這種被忽略的信息對於推論的目的是否重要仍有爭論。^[19]

替代方法[编辑]

巴納德檢定（英语：Barnard's test）可用於代替費雪檢定，^[20]特別是在2 × 2表格的情況有更高的檢定力。^[21]此外，博世路檢定（英语：Boschloo's test）是另一種精確檢定，亦比費雪檢定具有更高的檢定力。^[22]

對於階層式的類別資料，必須使用諸如CMH檢定（英语：Cochran–Mantel–Haenszel statistics）等考慮採樣階層的方法，而不是費雪檢定。

根據給定邊際成功率的勝算比的條件分布可以提出基於似然比檢定（英语：Likelihood-ratio test）的p值。^[19]此p值在推論上與正態分佈數據的經典檢定以及基於此條件似然函數的似然比和支持區間一致，並可在R語言上進行運算。^[23]

參考文獻[编辑]

^ Fisher, R. A. On the Interpretation of χ² from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society. 1922-01, 85 (1). doi:10.2307/2340521.
^ Fisher, Ronald Aylmer, Sir. Statistical methods for research workers. 14th ed., rev. and enl. Darien, Conn.,: Hafner Pub. Co. 1970. ISBN 0-05-002170-2. OCLC 135627.
^ Agresti, Alan. A Survey of Exact Inference for Contingency Tables. Statistical Science. 1992-02-01, 7 (1). ISSN 0883-4237. doi:10.1214/ss/1177011454.
^ Newman, James R. Mathematics of a Lady Tasting Tea. The world of mathematics. Mineola, N.Y.: Dover Publications. <2000->. ISBN 978-0-486-41153-8. OCLC 43555029. 请检查|date=中的日期值 (帮助)
^ Larntz, Kinley. Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics. Journal of the American Statistical Association. 1978-06, 73 (362). ISSN 0162-1459. doi:10.1080/01621459.1978.10481567 （英语）.
^ Mehta, Cyrus R.; Patel, Nitin R.; Tsiatis, Anastasios A. Exact Significance Testing to Establish Treatment Equivalence with Ordered Categorical Data. Biometrics. 1984-09, 40 (3). doi:10.2307/2530927.
^ Patel, Nitin R.; SPSS Inc. SPSS exact tests 6.1 for Windows. Chicago, Ill.: SPSS Inc. 1995. ISBN 0-13-450891-2. OCLC 34436454.
^ Mehta, Cyrus R.; Patel, Nitin R. A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables. Journal of the American Statistical Association. 1983-06, 78 (382). doi:10.2307/2288652.
^ Mehta, Cyrus R.; Patel, Nitin R. ALGORITHM 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables. ACM Transactions on Mathematical Software. 1986-06, 12 (2). ISSN 0098-3500. doi:10.1145/6497.214326 （英语）.
^ Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D. Large-scale gene function analysis with the PANTHER classification system. Nature Protocols. 2013-08, 8 (8). ISSN 1754-2189. PMC 6519453 . PMID 23868073. doi:10.1038/nprot.2013.092 （英语）. 引文格式1维护：PMC格式 (link)
^ ^11.0 ^11.1 Liddell, Douglas. Practical Tests of 2 × 2 Contingency Tables. The Statistician. 1976-12, 25 (4). doi:10.2307/2988087.
^ ^12.0 ^12.1 Berkson, Joseph. In dispraise of the exact test. Journal of Statistical Planning and Inference. 1978-01, 2 (1). doi:10.1016/0378-3758(78)90019-8 （英语）.
^ ^13.0 ^13.1 D'Agostino, Ralph B.; Chase, Warren; Belanger, Albert. The Appropriateness of Some Common Procedures for Testing the Equality of Two Independent Binomial Populations. The American Statistician. 1988-08, 42 (3). doi:10.2307/2685002.
^ ^14.0 ^14.1 ^14.2 ^14.3 Yates, F. Test of Significance for 2 × 2 Contingency Tables. Journal of the Royal Statistical Society. Series A (General). 1984, 147 (3). doi:10.2307/2981577.
^ ^15.0 ^15.1 ^15.2 ^15.3 Little, Roderick J. A. Testing the Equality of Two Independent Binomial Proportions. The American Statistician. 1989-11, 43 (4). doi:10.2307/2685390.
^ Mehta, Cyrus R.; Senchaudhuri, Pralay. Conditional versus unconditional exact tests for comparing two binomials (PDF). 4 September 2003 [20 November 2009].
^ Barnard, G. A. A New Test for 2 × 2 Tables. Nature. 1945-08, 156 (3954). ISSN 0028-0836. doi:10.1038/156177a0 （英语）.
^ ^18.0 ^18.1 Fisher, R. A. A New Test for 2 × 2 Tables. Nature. 1945-09, 156 (3961). ISSN 0028-0836. doi:10.1038/156388a0 （英语）.
^ ^19.0 ^19.1 ^19.2 ^19.3 Choi, Leena; Blume, Jeffrey D.; Dupont, William D. Olivier, Jake , 编. Elucidating the Foundations of Statistical Inference with 2 x 2 Tables. PLOS ONE. 2015-04-07, 10 (4). ISSN 1932-6203. PMC 4388855 . PMID 25849515. doi:10.1371/journal.pone.0121263 （英语）. 引文格式1维护：PMC格式 (link)
^ Lydersen, Stian; Fagerland, Morten W.; Laake, Petter. Recommended tests for association in 2×2 tables. Statistics in Medicine. 2009-03-30, 28 (7). doi:10.1002/sim.3531 （英语）.
^ Berger R.L. Power comparison of exact unconditional tests for comparing two binomial proportions. Institute of Statistics Mimeo Series No. 2266. 1994: 1–19.
^ Boschloo, R. D. Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica Neerlandica. 1970-03, 24 (1). ISSN 0039-0402. doi:10.1111/j.1467-9574.1970.tb00104.x （英语）.
^ Choi, Leena. ProfileLikelihood: profile likelihood for a parameter in commonly used statistical models; 2011. R package version 1.1.. 2011.

外部連結[编辑]

[1] Fisher, R. A. On the Interpretation of χ² from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society. 1922-01, 85 (1). doi:10.2307/2340521.

[2] Fisher, Ronald Aylmer, Sir. Statistical methods for research workers. 14th ed., rev. and enl. Darien, Conn.,: Hafner Pub. Co. 1970. ISBN 0-05-002170-2. OCLC 135627.

[3] Agresti, Alan. A Survey of Exact Inference for Contingency Tables. Statistical Science. 1992-02-01, 7 (1). ISSN 0883-4237. doi:10.1214/ss/1177011454.

[newman-4] Newman, James R. Mathematics of a Lady Tasting Tea. The world of mathematics. Mineola, N.Y.: Dover Publications. <2000->. ISBN 978-0-486-41153-8. OCLC 43555029. 请检查|date=中的日期值 (帮助)

[Larntz1978-5] Larntz, Kinley. Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics. Journal of the American Statistical Association. 1978-06, 73 (362). ISSN 0162-1459. doi:10.1080/01621459.1978.10481567 （英语）.

[Mehta1984-6] Mehta, Cyrus R.; Patel, Nitin R.; Tsiatis, Anastasios A. Exact Significance Testing to Establish Treatment Equivalence with Ordered Categorical Data. Biometrics. 1984-09, 40 (3). doi:10.2307/2530927.

[Mehta1995-7] Patel, Nitin R.; SPSS Inc. SPSS exact tests 6.1 for Windows. Chicago, Ill.: SPSS Inc. 1995. ISBN 0-13-450891-2. OCLC 34436454.

[8] Mehta, Cyrus R.; Patel, Nitin R. A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables. Journal of the American Statistical Association. 1983-06, 78 (382). doi:10.2307/2288652.

[9] Mehta, Cyrus R.; Patel, Nitin R. ALGORITHM 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables. ACM Transactions on Mathematical Software. 1986-06, 12 (2). ISSN 0098-3500. doi:10.1145/6497.214326 （英语）.

[10] Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D. Large-scale gene function analysis with the PANTHER classification system. Nature Protocols. 2013-08, 8 (8). ISSN 1754-2189. PMC 6519453 . PMID 23868073. doi:10.1038/nprot.2013.092 （英语）. 引文格式1维护：PMC格式 (link)

[Liddell-1976-11] 11.0 ^11.1 Liddell, Douglas. Practical Tests of 2 × 2 Contingency Tables. The Statistician. 1976-12, 25 (4). doi:10.2307/2988087.

[Berkson1978-12] 12.0 ^12.1 Berkson, Joseph. In dispraise of the exact test. Journal of Statistical Planning and Inference. 1978-01, 2 (1). doi:10.1016/0378-3758(78)90019-8 （英语）.

[DAgostino1988-13] 13.0 ^13.1 D'Agostino, Ralph B.; Chase, Warren; Belanger, Albert. The Appropriateness of Some Common Procedures for Testing the Equality of Two Independent Binomial Populations. The American Statistician. 1988-08, 42 (3). doi:10.2307/2685002.

[Yates1984-14] 14.0 ^14.1 ^14.2 ^14.3 Yates, F. Test of Significance for 2 × 2 Contingency Tables. Journal of the Royal Statistical Society. Series A (General). 1984, 147 (3). doi:10.2307/2981577.

[Little1989-15] 15.0 ^15.1 ^15.2 ^15.3 Little, Roderick J. A. Testing the Equality of Two Independent Binomial Proportions. The American Statistician. 1989-11, 43 (4). doi:10.2307/2685390.

[16] Mehta, Cyrus R.; Senchaudhuri, Pralay. Conditional versus unconditional exact tests for comparing two binomials (PDF). 4 September 2003 [20 November 2009].

[Barnard1945a-17] Barnard, G. A. A New Test for 2 × 2 Tables. Nature. 1945-08, 156 (3954). ISSN 0028-0836. doi:10.1038/156177a0 （英语）.

[NatureDiscussion-18] 18.0 ^18.1 Fisher, R. A. A New Test for 2 × 2 Tables. Nature. 1945-09, 156 (3961). ISSN 0028-0836. doi:10.1038/156388a0 （英语）.

[Choi2015-19] 19.0 ^19.1 ^19.2 ^19.3 Choi, Leena; Blume, Jeffrey D.; Dupont, William D. Olivier, Jake , 编. Elucidating the Foundations of Statistical Inference with 2 x 2 Tables. PLOS ONE. 2015-04-07, 10 (4). ISSN 1932-6203. PMC 4388855 . PMID 25849515. doi:10.1371/journal.pone.0121263 （英语）. 引文格式1维护：PMC格式 (link)

[20] Lydersen, Stian; Fagerland, Morten W.; Laake, Petter. Recommended tests for association in 2×2 tables. Statistics in Medicine. 2009-03-30, 28 (7). doi:10.1002/sim.3531 （英语）.

[21] Berger R.L. Power comparison of exact unconditional tests for comparing two binomial proportions. Institute of Statistics Mimeo Series No. 2266. 1994: 1–19.

[Boschloo-22] Boschloo, R. D. Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica Neerlandica. 1970-03, 24 (1). ISSN 0039-0402. doi:10.1111/j.1467-9574.1970.tb00104.x （英语）.

[Choi2011-23] Choi, Leena. ProfileLikelihood: profile likelihood for a parameter in commonly used statistical models; 2011. R package version 1.1.. 2011.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]