# 餘弦相似性

## 定義

${\displaystyle \mathbf {a} \cdot \mathbf {b} =\left\|\mathbf {a} \right\|\left\|\mathbf {b} \right\|\cos \theta }$

${\displaystyle {\text{similarity}}=\cos(\theta )={A\cdot B \over \|A\|\|B\|}={\frac {\sum \limits _{i=1}^{n}{A_{i}\times B_{i}}}{{\sqrt {\sum \limits _{i=1}^{n}{(A_{i})^{2}}}}\times {\sqrt {\sum \limits _{i=1}^{n}{(B_{i})^{2}}}}}}}$，這裡的${\displaystyle A_{i}}$${\displaystyle B_{i}}$分別代表向量${\displaystyle A}$${\displaystyle B}$的各分量

### 角相似性

「餘弦相似性」一詞有時也被用來表示另一個係數，儘管最常見的是像上述定義那樣的。透過使用相同計算方式得到的相似性，向量之間的規範化角度可以作為一個範圍在[0,1]上的有界相似性函數，從上述定義的相似性計算如下：

${\displaystyle 1-\left({\frac {\cos ^{-1}({\text{similarity}})}{\pi }}\right)}$

${\displaystyle 1-\left({\frac {2\cdot \cos ^{-1}({\text{similarity}})}{\pi }}\right)}$

### 與「Tanimoto」係數的混淆

${\displaystyle T(A,B)={A\cdot B \over \|A\|^{2}+\|B\|^{2}-A\cdot B}}$

### Ochiai係數

${\displaystyle K={\frac {n(A\cap B)}{\sqrt {n(A)\times n(B)}}}}$

## 參考文獻

1. ^ P.-N. Tan, M. Steinbach & V. Kumar, "Introduction to Data Mining", , Addison-Wesley (2005), ISBN 0-321-32136-7, chapter 8; page 500.
2. ^ Ochiai A. Zoogeographical studies on the soleoid fishes found Japan and its neighboring regions. II // Bull. Jap. Soc. sci. Fish. 1957. V. 22. № 9. P. 526-530.
3. ^ Barkman J.J. Phytosociology and ecology of cryptogamic epiphytes, including a taxonomic survey and description of their vegetation units in Europe. – Assen. Van Gorcum. 1958. 628 p.