# 朴素贝叶斯分类器

## 单纯贝氏概率模型

${\displaystyle p(C\vert F_{1},\dots ,F_{n})\,}$

${\displaystyle p(C\vert F_{1},\dots ,F_{n})={\frac {p(C)\ p(F_{1},\dots ,F_{n}\vert C)}{p(F_{1},\dots ,F_{n})}}.\,}$

${\displaystyle {\mbox{posterior}}={\frac {{\mbox{prior}}\times {\mbox{likelihood}}}{\mbox{evidence}}}.\,}$

${\displaystyle p(C,F_{1},\dots ,F_{n})\,}$

${\displaystyle p(C,F_{1},\dots ,F_{n})\,}$
${\displaystyle \varpropto p(C)\ p(F_{1},\dots ,F_{n}\vert C)}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2},\dots ,F_{n}\vert C,F_{1})}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3},\dots ,F_{n}\vert C,F_{1},F_{2})}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3}\vert C,F_{1},F_{2})\ p(F_{4},\dots ,F_{n}\vert C,F_{1},F_{2},F_{3})}$
${\displaystyle \varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3}\vert C,F_{1},F_{2})\ \dots p(F_{n}\vert C,F_{1},F_{2},F_{3},\dots ,F_{n-1}).}$

${\displaystyle p(F_{i}\vert C,F_{j})=p(F_{i}\vert C)\,}$

{\displaystyle {\begin{aligned}p(C\vert F_{1},\dots ,F_{n})&\varpropto p(C,F_{1},\dots ,F_{n})\\&\varpropto p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C)\ p(F_{3}\vert C)\ \cdots \,\\&\varpropto p(C)\prod _{i=1}^{n}p(F_{i}\vert C).\,\end{aligned}}}

${\displaystyle p(C\vert F_{1},\dots ,F_{n})={\frac {1}{Z}}p(C)\prod _{i=1}^{n}p(F_{i}\vert C)}$

### 从概率模型中构造分类器

${\displaystyle \mathrm {classify} (f_{1},\dots ,f_{n})={\underset {c}{\operatorname {argmax} }}\ p(C=c)\displaystyle \prod _{i=1}^{n}p(F_{i}=f_{i}\vert C=c).}$

## 实例

### 性别分类

#### 训练

6 180 12
5.92 (5'11") 190 11
5.58 (5'7") 170 12
5.92 (5'11") 165 10
5 100 6
5.5 (5'6") 150 8
5.42 (5'5") 130 7
5.75 (5'9") 150 9

#### 测试

sample 6 130 8

${\displaystyle posterior(male)={\frac {P(male)\,p(height|male)\,p(weight|male)\,p(footsize|male)}{evidence}}}$

${\displaystyle posterior(female)={\frac {P(female)\,p(height|female)\,p(weight|female)\,p(footsize|female)}{evidence}}}$

${\displaystyle evidence=P(male)\,p(height|male)\,p(weight|male)\,p(footsize|male)+P(female)\,p(height|female)\,p(weight|female)\,p(footsize|female)}$

${\displaystyle P(male)=0.5}$

${\displaystyle p({\mbox{height}}|{\mbox{male}})={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}\exp \left({\frac {-(6-\mu )^{2}}{2\sigma ^{2}}}\right)\approx 1.5789}$,其中${\displaystyle \mu =5.855}$${\displaystyle \sigma ^{2}=3.5033e^{-02}}$是训练集样本的正态分布参数. 注意，这里的值大于1也是允许的 – 这里是概率密度而不是概率，因为身高是一个连续的变量.

${\displaystyle p(weight|male)=5.9881e^{-06}}$
${\displaystyle p(footsize|male)=1.3112e^{-3}}$
${\displaystyle posteriornumerator(male)=6.1984e^{-09}}$
${\displaystyle P(female)=0.5}$
${\displaystyle p(height|female)=2.2346e^{-1}}$
${\displaystyle p(weight|female)=1.6789e^{-2}}$
${\displaystyle p(footsize|female)=2.8669e^{-1}}$
${\displaystyle posteriornumerator(female)=5.3778e^{-04}}$

### 文本分类

${\displaystyle p(w_{i}\vert C)\,}$

（通过这种处理，我们进一步简化了工作，假设每个单词是在文中是随机分布的-也就是单词不依赖于文本的长度，与其他词出现在文中的位置，或者其他文本内容。）

${\displaystyle p(D\vert C)=\prod _{i}p(w_{i}\vert C)\,}$

${\displaystyle p(D\vert C)={p(D\cap C) \over p(C)}}$
${\displaystyle p(C\vert D)={p(D\cap C) \over p(D)}}$

${\displaystyle p(C\vert D)={p(C) \over p(D)}\,p(D\vert C)}$

${\displaystyle p(D\vert S)=\prod _{i}p(w_{i}\vert S)\,}$
${\displaystyle p(D\vert \neg S)=\prod _{i}p(w_{i}\vert \neg S)\,}$

${\displaystyle p(S\vert D)={p(S) \over p(D)}\,\prod _{i}p(w_{i}\vert S)}$
${\displaystyle p(\neg S\vert D)={p(\neg S) \over p(D)}\,\prod _{i}p(w_{i}\vert \neg S)}$

${\displaystyle {p(S\vert D) \over p(\neg S\vert D)}={p(S)\,\prod _{i}p(w_{i}\vert S) \over p(\neg S)\,\prod _{i}p(w_{i}\vert \neg S)}}$

${\displaystyle {p(S\vert D) \over p(\neg S\vert D)}={p(S) \over p(\neg S)}\,\prod _{i}{p(w_{i}\vert S) \over p(w_{i}\vert \neg S)}}$

${\displaystyle \ln {p(S\vert D) \over p(\neg S\vert D)}=\ln {p(S) \over p(\neg S)}+\sum _{i}\ln {p(w_{i}\vert S) \over p(w_{i}\vert \neg S)}}$

(这种对数似然比的技术在统计中是一种常用的技术。在这种两个独立的分类情况下（如这个垃圾邮件的例子），把对数似然比转化为S曲线的形式)。

## 参考文献

1. Russell, Stuart; Norvig, Peter. 2nd. Prentice Hall. 2003 [1995]. ISBN 978-0137903955.
2. ^ Rennie, J.; Shih, L.; Teevan, J.; Karger, D. Tackling the poor assumptions of Naive Bayes classifiers (PDF). ICML. 2003.
3. ^ Rish, Irina. An empirical study of the naive Bayes classifier (PDF). IJCAI Workshop on Empirical Methods in AI. 2001.
4. Hand, D. J.; Yu, K. Idiot's Bayes — not so stupid after all?. International Statistical Review. 2001, 69 (3): 385–399. ISSN 0306-7734. doi:10.2307/1403452.
5. ^ Harry Zhang "The Optimality of Naive Bayes". FLAIRS2004 conference. (available online: PDF)
6. ^ Caruana, R. and Niculescu-Mizil, A.: "An empirical comparison of supervised learning algorithms". Proceedings of the 23rd international conference on Machine learning, 2006. (available online [1])
7. ^ George H. John and Pat Langley (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo.