# 聚类分析

（重定向自聚类

## 结构性聚类

### 聚集型层次聚类

Raw data

Traditional representation

## 分散性聚类

### K-均值法及衍生算法

#### K-均值法聚类

K-均值算法表示以空间中k个点为中心进行聚类，对最靠近他们的对象归类。

• 选择聚类的个数k.
• 任意产生k个聚类，然后确定聚类中心，或者直接生成k个中心。
• 对每个点确定其聚类中心点。
• 再计算其聚类新中心。
• 重复以上步骤直到满足收敛要求。（通常就是确定的中心点不再改变。）

## 应用

### 其他应用

• Abdi, H. (1994). Additive-tree representations (with an application to face processing) Lecture Notes in Biomathematics, 84, 43-59.. 1990.
• Clatworthy, J., Buick, D., Hankins, M., Weinman, J., & Horne, R. (2005). The use and reporting of cluster analysis in health psychology: A review. British Journal of Health Psychology 10: 329-358.
• Cole, A. J. & Wishart, D. (1970). An improved algorithm for the Jardine-Sibson method of generating overlapping clusters. The Computer Journal 13(2):156-163.
• Ester, M., Kriegel, H.P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, USA: AAAI Press, pp. 226–231.
• Heyer, L.J., Kruglyak, S. and Yooseph, S., Exploring Expression Data: Identification and Analysis of Coexpressed Genes, Genome Research 9:1106-1115.

For spectral clustering :

For estimating number of clusters:

• Can, F., Ozkarahan, E. A. (1990) "Concepts and effectiveness of the cover coefficient-based clustering methodology for text databases." ACM Transactions on Database Systems. 15 (4) 483-517.

For discussion of the elbow criterion:

• Aldenderfer, M.S., Blashfield, R.K, Cluster Analysis, (1984), Newbury Park (CA): Sage.