霍森-科佩爾曼算法

霍森–科佩爾曼算法基於聯合-查找算法，用於標記占據-非占據網格團簇。^[1] 此算法最早由霍森和科佩爾曼在1976年的文章《Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm》中提出。^[2]

逾滲理論[編輯]

逾滲理論研究格點上團簇的行為和統計性質。設在一個正方格子中每個元胞占據概率為p、非占據概率為1–p。每一組相鄰（共邊）的占據元胞各自形成一個團簇。團簇的數目、尺寸分布是逾滲理論中的重要問題。

Figure (a)

Figure (b)

一個5x5格子。
(a)中占據概率為p = 6/25 = 0.24。
(b)中占據概率為p = 16/25 = 0.64。

霍森–科佩爾曼算法查找、標記團簇[編輯]

霍森-科佩爾曼算法的主要步驟是：掃描格點、查找占據元胞並將其用其所屬團簇的序號標記。掃描格點時逐行掃描，一行結束後從下一行開頭繼續掃描。掃描每一個元胞時，若元胞被占據，則根據其相鄰的已標記所屬團簇的元胞將其標記，具體的操作要使用聯合-查找算法。如果元胞沒有任何相鄰的、被標記的占據元胞，那麼就用一個新的記號標記之。^[3]

聯合-查找算法[編輯]

聯合-查找算法是一種計算等價類的方法。定義函數 union(x,y) 將元素 x 和 y 劃為等價類。因為等價關係有傳遞性，所有與x等價的元素與y也等價。因此，對於每個元素x，都存在一組元素與x等價。這些元素構成了x的等價類。在此基礎上，定義函數find(x)以返回 x 所屬的等價類的代表元。

偽代碼[編輯]

掃描網格時，每掃到一個占據元胞，就要查看其相鄰的所有元胞。若某相鄰元胞被占據且已被掃過，那麼就執行union操作，將被掃描的元胞以及相鄰的元胞劃入同一個等價類；如果被掃描元胞與兩個不同標記的元胞相鄰，則將兩個團簇合併為一個。之後，執行find操作，找到等價類的代表元並以之標記等價類中的所有元胞。如果當前元胞沒有被掃過的占據的相鄰元胞，那麼用未使用的標籤標記之。

   Raster Scan and Labeling on the Grid
   largest_label = 0;
   for x in 0 to n_columns {
       for y in 0 to n_rows {
           if occupied[x,y] then
               left = occupied[x-1,y];
               above = occupied[x,y-1];
               if (left == 0) and (above == 0) then /* Neither a label above nor to the left. */
                   largest_label = largest_label + 1; /* Make a new, as-yet-unused cluster label. */
                   label[x,y] = largest_label;
               else if (left != 0) and (above == 0) then /* One neighbor, to the left. */
                   label[x,y] = find(left);
               else if (left == 0) and (above != 0) then /* One neighbor, above. */
                   label[x,y] = find(above);
               else /* Neighbors BOTH to the left and above. */
                   union(left,above); /* Link the left and above clusters. */
                   label[x,y] = find(left);
       }
   }

   Union
   void union(int x, int y)  {
       labels[find(x)] = find(y);
   }

   Find
   int find(int x)  {
       int y = x;
       while (labels[y] != y)
           y = labels[y];
       while (labels[x] != x)  {
           int z = labels[x];
           labels[x] = y;
           x = z;
       }
   return y;
   }

應用[編輯]

二值圖像的分割和聚類^[4]
求節域面積和節點線路長度^[5]
圖的連接性
電路導通模型

參見[編輯]

K-均值聚類算法
模糊聚類算法
高斯(期望最大化)聚類算法
聚類方法 ^[6]
C-均值聚類算法 ^[7]
連通分量標記

參考文獻[編輯]

^ 存档副本 (PDF). [2019-05-03]. （原始內容 (PDF)存檔於2021-03-08）.
^ Hoshen, J.; Kopelman, R. Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm. Phys. Rev. B. 15 October 1976, 14 (8): 3438–3445. doi:10.1103/PhysRevB.14.3438.
^ Fricke, Tobin. The Hoshen-Kopelman Algorithm for cluster identification. ocf.berkeley.edu. 2004-04-21 [2016-09-17]. （原始內容存檔於2021-05-07）.
^ Journal of Applied Sciences. Scialert.net. [2016-09-17]. （原始內容 (PDF)存檔於2017-10-07）.
^ Christian Joas. Introduction to the Hoshen-Kopelman algorithm and its application to nodal domain statistics (PDF). Webhome.weizmann.ac.il. [2016-09-17]. （原始內容 (PDF)存檔於2016-09-15）.
^ Clustering (PDF). （原始內容 (PDF)存檔於2021-04-21）.
^ Fuzzy c-means clustering. （原始內容存檔於2020-10-17）.

[1] 存档副本 (PDF). [2019-05-03]. （原始內容 (PDF)存檔於2021-03-08）.

[2] Hoshen, J.; Kopelman, R. Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm. Phys. Rev. B. 15 October 1976, 14 (8): 3438–3445. doi:10.1103/PhysRevB.14.3438.

[:0-3] Fricke, Tobin. The Hoshen-Kopelman Algorithm for cluster identification. ocf.berkeley.edu. 2004-04-21 [2016-09-17]. （原始內容存檔於2021-05-07）.

[4] Journal of Applied Sciences. Scialert.net. [2016-09-17]. （原始內容 (PDF)存檔於2017-10-07）.

[5] Christian Joas. Introduction to the Hoshen-Kopelman algorithm and its application to nodal domain statistics (PDF). Webhome.weizmann.ac.il. [2016-09-17]. （原始內容 (PDF)存檔於2016-09-15）.

[6] Clustering (PDF). （原始內容 (PDF)存檔於2021-04-21）.

[7] Fuzzy c-means clustering. （原始內容存檔於2020-10-17）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]