# 單純貝氏分類器

## 單純貝氏機率模型

${\displaystyle p(C\vert F_{1},\dots ,F_{n})\,}$

${\displaystyle p(C\vert F_{1},\dots ,F_{n})={\frac {p(C)\ p(F_{1},\dots ,F_{n}\vert C)}{p(F_{1},\dots ,F_{n})}}.\,}$

${\displaystyle {\mbox{posterior}}={\frac {{\mbox{prior}}\times {\mbox{likelihood}}}{\mbox{evidence}}}.\,}$

${\displaystyle p(C,F_{1},\dots ,F_{n})\,}$

${\displaystyle p(C,F_{1},\dots ,F_{n})\,}$
${\displaystyle \varpropto p(C)\ p(F_{1},\dots ,F_{n}\vert C)}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2},\dots ,F_{n}\vert C,F_{1})}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3},\dots ,F_{n}\vert C,F_{1},F_{2})}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3}\vert C,F_{1},F_{2})\ p(F_{4},\dots ,F_{n}\vert C,F_{1},F_{2},F_{3})}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3}\vert C,F_{1},F_{2})\ \dots p(F_{n}\vert C,F_{1},F_{2},F_{3},\dots ,F_{n-1}).}$

${\displaystyle p(F_{i}\vert C,F_{j})=p(F_{i}\vert C)\,}$

{\displaystyle {\begin{aligned}p(C\vert F_{1},\dots ,F_{n})&\varpropto p(C,F_{1},\dots ,F_{n})\\&\varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C)\ p(F_{3}\vert C)\ \cdots \,\\&\varpropto p(C)\prod _{i=1}^{n}p(F_{i}\vert C).\,\end{aligned}}}

${\displaystyle p(C\vert F_{1},\dots ,F_{n})={\frac {1}{Z}}p(C)\prod _{i=1}^{n}p(F_{i}\vert C)}$

### 從機率模型中構造分類器

${\displaystyle \mathrm {classify} (f_{1},\dots ,f_{n})={\underset {c}{\operatorname {argmax} }}\ p(C=c)\displaystyle \prod _{i=1}^{n}p(F_{i}=f_{i}\vert C=c).}$

## 實例

### 性別分類

#### 訓練

6 180 12
5.92 (5'11") 190 11
5.58 (5'7") 170 12
5.92 (5'11") 165 10
5 100 6
5.5 (5'6") 150 8
5.42 (5'5") 130 7
5.75 (5'9") 150 9

#### 測試

${\displaystyle posterior(male)={\frac {P(male)\,p(height|male)\,p(weight|male)\,p(footsize|male)}{evidence}}}$

${\displaystyle posterior(female)={\frac {P(female)\,p(height|female)\,p(weight|female)\,p(footsize|female)}{evidence}}}$

${\displaystyle evidence=P(male)\,p(height|male)\,p(weight|male)\,p(footsize|male)+P(female)\,p(height|female)\,p(weight|female)\,p(footsize|female)}$

${\displaystyle P(male)=0.5}$

${\displaystyle p({\mbox{height}}|{\mbox{male}})={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}\exp \left({\frac {-(6-\mu )^{2}}{2\sigma ^{2}}}\right)\approx 1.5789}$,其中${\displaystyle \mu =5.855}$${\displaystyle \sigma ^{2}=3.5033e^{-02}}$是訓練集樣本的常態分布參數. 注意，這裡的值大於1也是允許的 – 這裡是機率密度而不是機率，因為身高是一個連續的變數.

${\displaystyle p(weight|male)=5.9881e^{-06}}$
${\displaystyle p(footsize|male)=1.3112e^{-3}}$
${\displaystyle posteriornumerator(male)=6.1984e^{-09}}$
${\displaystyle P(female)=0.5}$
${\displaystyle p(height|female)=2.2346e^{-1}}$
${\displaystyle p(weight|female)=1.6789e^{-2}}$
${\displaystyle p(footsize|female)=2.8669e^{-1}}$
${\displaystyle posteriornumerator(female)=5.3778e^{-04}}$

### 文字分類

${\displaystyle p(w_{i}\vert C)\,}$

（通過這種處理，我們進一步簡化了工作，假設每個單詞是在文中是隨機分布的-也就是單詞不依賴於文字的長度，與其他詞出現在文中的位置，或者其他文字內容。）

${\displaystyle p(D\vert C)=\prod _{i}p(w_{i}\vert C)\,}$

${\displaystyle p(D\vert C)={p(D\cap C) \over p(C)}}$
${\displaystyle p(C\vert D)={p(D\cap C) \over p(D)}}$

${\displaystyle p(C\vert D)={p(C) \over p(D)}\,p(D\vert C)}$

${\displaystyle p(D\vert S)=\prod _{i}p(w_{i}\vert S)\,}$
${\displaystyle p(D\vert \neg S)=\prod _{i}p(w_{i}\vert \neg S)\,}$

${\displaystyle p(S\vert D)={p(S) \over p(D)}\,\prod _{i}p(w_{i}\vert S)}$
${\displaystyle p(\neg S\vert D)={p(\neg S) \over p(D)}\,\prod _{i}p(w_{i}\vert \neg S)}$

${\displaystyle {p(S\vert D) \over p(\neg S\vert D)}={p(S)\,\prod _{i}p(w_{i}\vert S) \over p(\neg S)\,\prod _{i}p(w_{i}\vert \neg S)}}$

${\displaystyle {p(S\vert D) \over p(\neg S\vert D)}={p(S) \over p(\neg S)}\,\prod _{i}{p(w_{i}\vert S) \over p(w_{i}\vert \neg S)}}$

${\displaystyle \ln {p(S\vert D) \over p(\neg S\vert D)}=\ln {p(S) \over p(\neg S)}+\sum _{i}\ln {p(w_{i}\vert S) \over p(w_{i}\vert \neg S)}}$

(這種對數概似比的技術在統計中是一種常用的技術。在這種兩個獨立的分類情況下（如這個垃圾郵件的例子），把對數概似比轉化為S曲線的形式)。

## 參考文獻

1. Russell, Stuart; Norvig, Peter. 2nd. Prentice Hall. 2003 [1995]. ISBN 978-0137903955.
2. ^ Rennie, J.; Shih, L.; Teevan, J.; Karger, D. Tackling the poor assumptions of Naive Bayes classifiers (PDF). ICML. 2003.
3. ^ Rish, Irina. An empirical study of the naive Bayes classifier (PDF). IJCAI Workshop on Empirical Methods in AI. 2001 [2012-04-01]. （原始內容存檔 (PDF)於2017-12-10）.
4. Hand, D. J.; Yu, K. Idiot's Bayes — not so stupid after all?. International Statistical Review. 2001, 69 (3): 385–399. ISSN 0306-7734. doi:10.2307/1403452.
5. ^ Harry Zhang "The Optimality of Naive Bayes". FLAIRS2004 conference. (available online: PDF頁面存檔備份，存於網際網路檔案館）)
6. ^ Caruana, R. and Niculescu-Mizil, A.: "An empirical comparison of supervised learning algorithms". Proceedings of the 23rd international conference on Machine learning, 2006. (available online [1]頁面存檔備份，存於網際網路檔案館）)
7. ^ George H. John and Pat Langley (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo.