# 序列最小优化算法

分类 训练支持向量机的优化算法 O(n³)

## 问题定义

SMO算法主要用于解决支持向量机目标函数的最优化问题。考虑数据集${\displaystyle (\mathbf {x_{1}} ,y_{1}),\ldots ,(\mathbf {x_{n}} ,y_{n})}$的二分类问题，其中${\displaystyle \mathbf {x_{i}} }$是输入向量，${\displaystyle y_{i}\in \{-1,1\}}$是向量的类别标签，只允许取两个值。一个软间隔支持向量机的目标函数最优化等价于求解以下二次规划问题的最大值：

${\displaystyle W=\max _{\alpha }\sum _{i=1}^{n}\alpha _{i}-{\frac {1}{2}}\sum _{i=1}^{n}\sum _{j=1}^{n}y_{i}y_{j}K(x_{i},x_{j})\alpha _{i}\alpha _{j},}$

${\displaystyle 0\leq \alpha _{i}\leq C,\quad {\mbox{ for }}i=1,2,\ldots ,n,}$
${\displaystyle \sum _{i=1}^{n}y_{i}\alpha _{i}=0,}$

## 算法

SMO是一种解决此类支持向量机优化问题的迭代算法。由于目标函数为凸函数，一般的优化算法都通过梯度方法一次优化一个变量求解二次规划问题的最大值，但是，对于以上问题，由于限制条件${\displaystyle \sum _{i=1}^{n}y_{i}\alpha _{i}=0}$存在，当某个${\displaystyle \alpha _{i}\,}$${\displaystyle \alpha _{i}^{old}}$更新到${\displaystyle \alpha _{i}^{new}}$时，上述限制条件即被打破。为了克服以上的困难，SMO采用一次更新两个变量的方法。

### 数学推导

${\displaystyle K_{ij}=K(\mathbf {x_{i}} ,\mathbf {x_{j}} ),f(\mathbf {x_{i}} )=\sum _{j=1}^{n}y_{j}\alpha _{j}K_{ij}+b,}$
${\displaystyle v_{i}=f(\mathbf {x_{i}} )-\sum _{j=1}^{2}y_{j}\alpha _{j}K_{ij}-b}$

${\displaystyle {\begin{array}{lcl}W(\alpha _{1},\alpha _{2})&=&\sum _{i=1}^{n}\alpha _{i}-{\frac {1}{2}}\sum _{i=1}^{n}\sum _{j=1}^{n}y_{i}y_{j}K(x_{i},x_{j})\alpha _{i}\alpha _{j}\\&=&\alpha _{1}+\alpha _{2}-{\frac {1}{2}}K_{11}\alpha _{1}^{2}-{\frac {1}{2}}K_{22}\alpha _{2}^{2}-y_{1}y_{2}K_{12}\alpha _{1}\alpha _{2}\\&&-y_{1}\alpha _{1}v_{1}-y_{2}\alpha _{2}v_{2}+{\text{constant}}\,\end{array}}}$

${\displaystyle {\begin{array}{lcl}W(\alpha _{2})&=&\gamma -s\alpha _{2}+\alpha _{2}-{\frac {1}{2}}K_{11}(\gamma -s\alpha _{2})^{2}-{\frac {1}{2}}K_{22}\alpha _{2}^{2}\\&&-sK_{12}(\gamma -s\alpha _{2})\alpha _{2}-y_{1}(\gamma -s\alpha _{2})v_{1}-y_{2}\alpha _{2}v_{2}+{\text{constant}}\end{array}}}$

${\displaystyle \alpha _{2}\,}$偏导以求得最大值，有

${\displaystyle {\begin{array}{lcl}{\frac {\partial W(\alpha _{2})}{\partial \alpha _{2}}}&=&-s+1+sK_{11}\gamma -K_{11}\alpha _{2}-K_{22}\alpha _{2}+2K_{12}\alpha _{2}-sK_{12}\gamma \\&&+y_{2}v_{1}-y_{2}v_{2}=0\end{array}}}$

${\displaystyle \alpha _{2}^{new}={\frac {y_{2}(y_{2}-y_{1}+y_{1}\gamma (K_{11}-K_{12})+v_{1}-v_{2})}{K_{11}+K_{22}-2K_{12}}}}$

${\displaystyle \alpha _{2}^{new}=\alpha _{2}^{old}+{\frac {y_{2}(E_{1}-E_{2})}{K}}}$

### 算法框架

SMO算法是一个迭代优化算法。在每一个迭代步骤中，算法首先选取两个待更新的向量，此后分别计算它们的误差项，并根据上述结果计算出${\displaystyle \alpha _{2}^{new}}$${\displaystyle \alpha _{1}^{new}}$。最后再根据SVM的定义计算出偏移量${\displaystyle \mathbf {b} }$。对于误差项而言，可以根据${\displaystyle \alpha _{1}^{new}}$${\displaystyle \alpha _{2}^{new}}$${\displaystyle b}$的增量进行调整，而无需每次重新计算。具体的算法如下：

1 随机数初始化向量权重${\displaystyle \alpha _{i}\,}$，并计算偏移${\displaystyle b}$
2 初始化误差项${\displaystyle E_{i}\,}$
3 选取两个向量作为需要调整的点
4 令${\displaystyle \alpha _{2}^{new}=\alpha _{2}^{old}+{\frac {y_{2}(E_{1}-E_{2})}{K}}}$
5 如果${\displaystyle \alpha _{2}^{new}>V}$
6     令${\displaystyle \alpha _{2}^{new}=V}$
7 如果${\displaystyle \alpha _{2}^{new}
8     令${\displaystyle \alpha _{2}^{new}=U}$
9 令${\displaystyle \alpha _{1}^{new}=\alpha _{1}^{old}+y_{1}y_{2}(\alpha _{2}^{old}-\alpha _{2}^{new})}$
10 利用更新的${\displaystyle \alpha _{1}^{new}}$和${\displaystyle \alpha _{2}^{new}}$修改${\displaystyle E_{i}\,}$和${\displaystyle b}$的值
11 如果达到终止条件，则停止算法，否则转3


${\displaystyle U={\begin{cases}\max {\left\{0,\alpha _{2}^{old}-\alpha _{1}^{old}\right\}}&y_{1}y_{2}=-1\\\max {\left\{0,\alpha _{1}^{old}+\alpha _{2}^{old}-C\right\}}&y_{1}y_{2}=1\end{cases}},V={\begin{cases}\min {\left\{C,C+\alpha _{2}^{old}-\alpha _{1}^{old}\right\}}&y_{1}y_{2}=-1\\\min {\left\{C,\alpha _{2}^{old}+\alpha _{1}^{old}\right\}}&y_{1}y_{2}=1\end{cases}}}$

### 优化向量选择方法

${\displaystyle y_{i}f(\mathbf {x} _{i}){\begin{cases}>1&\alpha _{i}=0\\=1&0<\alpha _{1}

### 终止条件

SMO算法的终止条件可以为KKT条件对所有向量均满足，或者目标函数${\displaystyle W(\alpha )\,}$增长率小于某个阈值，即

${\displaystyle {\frac {W(\alpha ^{t+1})-W(\alpha ^{t})}{W(\alpha ^{t})}}[5]

## 参考资料

1. ^ Chih-Chung Chang and Chih-Jen Lin (2001). LIBSVM: a Library for Support Vector Machines.
2. ^
3. ^ Rifkin, Ryan, Everything Old is New Again: a Fresh Look at Historical Approaches in Machine Learning, Ph.D. thesis, 2002: 18
4. Tsang, I. W.; Kwok, J. T.; Cheung, P. M., Core Vector Machines: Fast SVM Training on Very Large Data Sets, Journal of Machine Learning Research, 2005, (6): 363–392