霍森-科佩尔曼算法

霍森–科佩尔曼算法基于联合-查找算法，用于标记占据-非占据网格团簇。^[1] 此算法最早由霍森和科佩尔曼在1976年的文章《Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm》中提出。^[2]

逾渗理论[编辑]

逾渗理论研究格点上团簇的行为和统计性质。设在一个正方格子中每个元胞占据概率为p、非占据概率为1–p。每一组相邻（共边）的占据元胞各自形成一个团簇。团簇的数目、尺寸分布是逾渗理论中的重要问题。

Figure (a)

Figure (b)

一个5x5格子。
(a)中占据概率为p = 6/25 = 0.24。
(b)中占据概率为p = 16/25 = 0.64。

霍森–科佩尔曼算法查找、标记团簇[编辑]

霍森-科佩尔曼算法的主要步骤是：扫描格点、查找占据元胞并将其用其所属团簇的序号标记。扫描格点时逐行扫描，一行结束后从下一行开头继续扫描。扫描每一个元胞时，若元胞被占据，则根据其相邻的已标记所属团簇的元胞将其标记，具体的操作要使用联合-查找算法。如果元胞没有任何相邻的、被标记的占据元胞，那么就用一个新的记号标记之。^[3]

联合-查找算法[编辑]

联合-查找算法是一种计算等价类的方法。定义函数 union(x,y) 将元素 x 和 y 划为等价类。因为等价关系有传递性，所有与x等价的元素与y也等价。因此，对于每个元素x，都存在一组元素与x等价。这些元素构成了x的等价类。在此基础上，定义函数find(x)以返回 x 所属的等价类的代表元。

伪代码[编辑]

扫描网格时，每扫到一个占据元胞，就要查看其相邻的所有元胞。若某相邻元胞被占据且已被扫过，那么就执行union操作，将被扫描的元胞以及相邻的元胞划入同一个等价类；如果被扫描元胞与两个不同标记的元胞相邻，则将两个团簇合并为一个。之后，执行find操作，找到等价类的代表元并以之标记等价类中的所有元胞。如果当前元胞没有被扫过的占据的相邻元胞，那么用未使用的标签标记之。

   Raster Scan and Labeling on the Grid
   largest_label = 0;
   for x in 0 to n_columns {
       for y in 0 to n_rows {
           if occupied[x,y] then
               left = occupied[x-1,y];
               above = occupied[x,y-1];
               if (left == 0) and (above == 0) then /* Neither a label above nor to the left. */
                   largest_label = largest_label + 1; /* Make a new, as-yet-unused cluster label. */
                   label[x,y] = largest_label;
               else if (left != 0) and (above == 0) then /* One neighbor, to the left. */
                   label[x,y] = find(left);
               else if (left == 0) and (above != 0) then /* One neighbor, above. */
                   label[x,y] = find(above);
               else /* Neighbors BOTH to the left and above. */
                   union(left,above); /* Link the left and above clusters. */
                   label[x,y] = find(left);
       }
   }

   Union
   void union(int x, int y)  {
       labels[find(x)] = find(y);
   }

   Find
   int find(int x)  {
       int y = x;
       while (labels[y] != y)
           y = labels[y];
       while (labels[x] != x)  {
           int z = labels[x];
           labels[x] = y;
           x = z;
       }
   return y;
   }

应用[编辑]

二值图像的分割和聚类^[4]
求节域面积和节点线路长度^[5]
图的连接性
电路导通模型

参见[编辑]

K-均值聚类算法
模糊聚类算法
高斯(期望最大化)聚类算法
聚类方法 ^[6]
C-均值聚类算法 ^[7]
连通分量标记

参考文献[编辑]

^ 存档副本 (PDF). [2019-05-03]. （原始内容 (PDF)存档于2021-03-08）.
^ Hoshen, J.; Kopelman, R. Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm. Phys. Rev. B. 15 October 1976, 14 (8): 3438–3445. doi:10.1103/PhysRevB.14.3438.
^ Fricke, Tobin. The Hoshen-Kopelman Algorithm for cluster identification. ocf.berkeley.edu. 2004-04-21 [2016-09-17]. （原始内容存档于2021-05-07）.
^ Journal of Applied Sciences. Scialert.net. [2016-09-17]. （原始内容 (PDF)存档于2017-10-07）.
^ Christian Joas. Introduction to the Hoshen-Kopelman algorithm and its application to nodal domain statistics (PDF). Webhome.weizmann.ac.il. [2016-09-17]. （原始内容 (PDF)存档于2016-09-15）.
^ Clustering (PDF). （原始内容 (PDF)存档于2021-04-21）.
^ Fuzzy c-means clustering. （原始内容存档于2020-10-17）.

[1] 存档副本 (PDF). [2019-05-03]. （原始内容 (PDF)存档于2021-03-08）.

[2] Hoshen, J.; Kopelman, R. Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm. Phys. Rev. B. 15 October 1976, 14 (8): 3438–3445. doi:10.1103/PhysRevB.14.3438.

[:0-3] Fricke, Tobin. The Hoshen-Kopelman Algorithm for cluster identification. ocf.berkeley.edu. 2004-04-21 [2016-09-17]. （原始内容存档于2021-05-07）.

[4] Journal of Applied Sciences. Scialert.net. [2016-09-17]. （原始内容 (PDF)存档于2017-10-07）.

[5] Christian Joas. Introduction to the Hoshen-Kopelman algorithm and its application to nodal domain statistics (PDF). Webhome.weizmann.ac.il. [2016-09-17]. （原始内容 (PDF)存档于2016-09-15）.

[6] Clustering (PDF). （原始内容 (PDF)存档于2021-04-21）.

[7] Fuzzy c-means clustering. （原始内容存档于2020-10-17）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]