# 朴素贝叶斯分类器

## 朴素贝叶斯概率模型

$p(C \vert F_1,\dots,F_n)\,$

$p(C \vert F_1,\dots,F_n) = \frac{p(C) \ p(F_1,\dots,F_n\vert C)}{p(F_1,\dots,F_n)}. \,$

$\mbox{posterior} = \frac{\mbox{prior} \times \mbox{likelihood}}{\mbox{evidence}}. \,$

$p(C \vert F_1, \dots, F_n)\,$

$p(C \vert F_1, \dots, F_n)\,$
$\varpropto p(C) \ p(F_1,\dots,F_n\vert C)$
$\varpropto p(C) \ p(F_1\vert C) \ p(F_2,\dots,F_n\vert C, F_1)$
$\varpropto p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3,\dots,F_n\vert C, F_1, F_2)$
$\varpropto p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3\vert C, F_1, F_2) \ p(F_4,\dots,F_n\vert C, F_1, F_2, F_3)$
$\varpropto p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3\vert C, F_1, F_2) \ \dots p(F_n\vert C, F_1, F_2, F_3,\dots,F_{n-1}).$

$p(F_i \vert C, F_j) = p(F_i \vert C)\,$

\begin{align} p(C \vert F_1, \dots, F_n) & \varpropto p(C) \ p(F_1\vert C) \ p(F_2\vert C) \ p(F_3\vert C) \ \cdots\, \\& \varpropto p(C) \prod_{i=1}^n p(F_i \vert C).\,\end{align}

$p(C \vert F_1,\dots,F_n) = \frac{1}{Z} p(C) \prod_{i=1}^n p(F_i \vert C)$

## 贝叶斯分类器特点

1、 需要知道先验概率

2、按照获得的信息对先验概率进行修正

3、分类决策存在错误率

## 从概率模型中构造分类器

$\mathrm{classify}(f_1,\dots,f_n) = \underset{c}{\operatorname{argmax}} \ p(C=c) \displaystyle\prod_{i=1}^n p(F_i=f_i\vert C=c).$

$L_ij$ = ${p(f\vert c_i)\over p(f\vert c_j)}$称为似然比，它与待识别的特征向量有关；
$Q_ij$ = ${p(c_j)\over p(c_i)}$称为判决门限，它仅与两类的先验概率有关。 若$L_ij(x) > Q_ij$, 对任意的$i,j= 1,2,\dots,c,$ i不等于j，则x属于$c_i$。该分类器称为最大似然比贝叶斯分类器。

## 实例

### 性别分类

#### 训练

6 180 12
5.92 (5'11") 190 11
5.58 (5'7") 170 12
5.92 (5'11") 165 10
5 100 6
5.5 (5'6") 150 8
5.42 (5'5") 130 7
5.75 (5'9") 150 9

#### 测试

sample 6 130 8

$posterior (male) = \frac{P(male) \, p(height | male) \, p(weight | male) \, p(foot size | male)}{evidence}$

$posterior (female) = \frac{P(female) \, p(height | female) \, p(weight | female) \, p(foot size | female)}{evidence}$

$evidence = P(male) \, p(height | male) \, p(weight | male) \, p(foot size | male) + P(female) \, p(height | female) \, p(weight | female) \, p(foot size | female)$

$P(male) = 0.5$

$p(\mbox{height} | \mbox{male}) = \frac{1}{\sqrt{2\pi \sigma^2}}\exp\left(\frac{-(6-\mu)^2}{2\sigma^2}\right) \approx 1.5789$,其中$\mu = 5.855$$\sigma^2 = 3.5033e^{-02}$是训练集样本的正态分布参数. 注意，这里的值大于1也是允许的 – 这里是概率密度而不是概率，因为身高是一个连续的变量.

$p(weight | male) = 5.9881e^{-06}$
$p(foot size | male) = 1.3112e^{-3}$
$posterior numerator (male) = 6.1984e^{-09}$
$P(female) = 0.5$
$p(height | female) = 2.2346e^{-1}$
$p(weight | female) = 1.6789e^{-2}$
$p(foot size | female) = 2.8669e^{-1}$
$posterior numerator (female) = 5.3778e^{-04}$

### 文本分类

$p(w_i \vert C)\,$

（通过这种处理，我们进一步简化了工作，假设每个单词是在文中是随机分布的-也就是单词不依赖于文本的长度，与其他词出现在文中的位置，或者其他文本内容。）

$p(D\vert C)=\prod_i p(w_i \vert C)\,$

$p(D\vert C)={p(D\cap C)\over p(C)}$
$p(C\vert D)={p(D\cap C)\over p(D)}$

$p(C\vert D)={p(C)\over p(D)}\,p(D\vert C)$

$p(D\vert S)=\prod_i p(w_i \vert S)\,$
$p(D\vert\neg S)=\prod_i p(w_i\vert\neg S)\,$

$p(S\vert D)={p(S)\over p(D)}\,\prod_i p(w_i \vert S)$
$p(\neg S\vert D)={p(\neg S)\over p(D)}\,\prod_i p(w_i \vert\neg S)$

${p(S\vert D)\over p(\neg S\vert D)}={p(S)\,\prod_i p(w_i \vert S)\over p(\neg S)\,\prod_i p(w_i \vert\neg S)}$

${p(S\vert D)\over p(\neg S\vert D)}={p(S)\over p(\neg S)}\,\prod_i {p(w_i \vert S)\over p(w_i \vert\neg S)}$

$\ln{p(S\vert D)\over p(\neg S\vert D)}=\ln{p(S)\over p(\neg S)}+\sum_i \ln{p(w_i\vert S)\over p(w_i\vert\neg S)}$

(这种对数似然比的技术在统计中是一种常用的技术。在这种两个独立的分类情况下（如这个垃圾邮件的例子），把对数似然比转化为sigmoid curve的形式)。

## 参考文献

1. ^ Harry Zhang "The Optimality of Naive Bayes". FLAIRS2004 conference. (available online: PDF)
2. ^ Caruana, R. and Niculescu-Mizil, A.: "An empirical comparison of supervised learning algorithms". Proceedings of the 23rd international conference on Machine learning, 2006. (available online [2])
3. ^ George H. John and Pat Langley (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo.

## 外部链接

• IMSL Numerical Libraries Collections of math and statistical algorithms available in C/C++, Fortran, Java and C#/.NET. Data mining routines in the IMSL Libraries include a Naive Bayes classifier.
• Orange, a free data mining software suite, module orngBayes
• Winnow content recommendation Open source Naive Bayes text classifier works with very small training and unbalanced training sets. High performance, C, any Unix.
• Naive Bayes implementation in Visual Basic (includes executable and source code).
• An interactive Microsoft Excel spreadsheet Naive Bayes implementation using VBA (requires enabled macros) with viewable source code.
• jBNC - Bayesian Network Classifier Toolbox
• POPFile Perl-based email proxy system classifies email into user-defined "buckets", including spam.
• Statistical Pattern Recognition Toolbox for Matlab.
• sux0r An Open Source Content management system with a focus on Naive Bayesian categorization and probabilistic content.
• ifile - the first freely available (Naive) Bayesian mail/spam filter
• NClassifier - NClassifier is a .NET library that supports text classification and text summarization. It is a port of the Nick Lothian's popular Java text classification engine, Classifier4J.
• Classifier4J - Classifier4J is a Java library designed to do text classification. It comes with an implementation of a Bayesian classifier, and now has some other features, including a text summary facility.
• MALLET - A Java package for document classification and other natural language processing tasks.
• nBayes - nBayes is an open source .NET library written in C#
• Apache Mahout - Machine learning package offered by Apache Open Source
• Weka - Popular open source machine learning package, written in Java
• C# implementation of Naive Bayes used for documents categorization that includes files processing, stop words filtering and stemming. Same site offers comparison to other algorithms.
• Bayes-Classfier/View-details.html OpenPR-NB - A C++ implementation of Naive Bayes Classifier. It supports both multinomial and multivariate Bernoulli event model. The maximum likelihood estimate with a Laplace smoothing is used for learning parameters.

• Domingos, Pedro & Michael Pazzani (1997) "On the optimality of the simple Bayesian classifier under zero-one loss". Machine Learning, 29:103–137. (also online at CiteSeer: [1])
• Rish, Irina. (2001). "An empirical study of the naive Bayes classifier". IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. (available online: PDF, PostScript)
• Hand, DJ, & Yu, K. (2001). "Idiot's Bayes - not so stupid after all?" International Statistical Review. Vol 69 part 3, pages 385-399. ISSN 0306-7734.
• Webb, G. I., J. Boughton, and Z. Wang (2005). Not So Naive Bayes: Aggregating One- Dependence Estimators. Machine Learning 58(1). Netherlands: Springer, pages 5–24.
• Mozina M, Demsar J, Kattan M, & Zupan B. (2004). "Nomograms for Visualization of Naive Bayesian Classifier". In Proc. of PKDD-2004, pages 337-348. (available online: PDF)
• Maron, M. E. (1961). "Automatic Indexing: An Experimental Inquiry." Journal of the ACM (JACM) 8(3):404–417. (available online: key1=321084&key2=9636178211&coll=GUIDE&dl=ACM&CFID=56729577&CFTOKEN=37855803 PDF)
• Minsky, M. (1961). "Steps toward Artificial Intelligence." Proceedings of the IRE 49(1):8-30.
• McCallum, A. and Nigam K. "A Comparison of Event Models for Naive Bayes Text Classification". In AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48. Technical Report WS-98-05. AAAI Press. 1998.(available online: PDF)
• Rennie J, Shih L, Teevan J, and Karger D. Tackling The Poor Assumptions of Naive Bayes Classifiers. In Proceedings of the Twentieth International Conference on Machine Learning (ICML). 2003. (available online: PDF)