多示例學習

在機器學習中， 多示例學習 (MIL) 是由監督式學習演變而來的。相較於輸入一系列被單獨標註的示例，在多示例學習中，輸入的是一系列被標註的「包」，每個「包」都包括許多示例。舉一個二元分類的簡單的例子，當包中的所有示例都是負例時，這個包會被標註為負包。另一方面，當包中至少含有一個正例時，這個包會被標註為正包。當收到一系列被標註的包時，機器試著去：（1）歸納出一個類別概念以便正確標註個別示例。（2）在歸納之外學習怎樣去標註一個包。

就圖像分類舉一個例子：給出一個圖像，我們想要根據圖像的畫面內容來確定它的目標類別。比如，當圖像同時包括了「沙子」和「水」時，圖像的目標類別可能是「海灘」。在多示例學習中，圖像被描述成一個包： $X=\{X_{1},..,X_{N}\}$ , 其中每一個 $X_{i}$ 均是從圖像中相應第i個區域中提取出來的特徵向量（我們稱之為示例），N是圖像被分割出的區域（示例）個數。當圖像包同時包含「沙子」區域示例和「水」區域示例時，這個包會被標註成正例（「海灘」）。

多示例學習這一名稱最初是由Dietterich，Lathrop & Lozano-Pérez (1997)提出來的，但是類似更早的研究，有Keeler，Rumelhart & Leow (1990)的手寫數字識別。最近關於多示例學習的回顧文獻包括了Amores (2013)，對於不同的範式，它提供了一個廣泛的回顧和比較研究。還有Foulds & Frank (2010)，對於文獻中不同的範式所提出的不同假設，它提供了一個全面的回顧。

運用多示例學習的幾個例子：

分子活性
鈣調素結合蛋白結合位點的預測 ^[1]
對於選擇性剪接異構體的預測作用 Li，Menon & et al. (2014),Eksi 等人 (2013)
圖像分類Maron & Ratan (1998)
文本或文檔分類

數不清的研究都在做著促使傳統分類技術，諸如支持向量機或是提升方法，適應於多示例學習環境的工作。

參見[編輯]

Multi-label classification

參考資料[編輯]

^ Minhas, Fayyaz. Multiple instance learning of Calmodulin binding sites,. Bioinformatics. 2012, 28 (18): i416-i422 [2015-07-10]. doi:10.1093/bioinformatics/bts416. （原始內容存檔於2015-03-04）.

Dietterich, Thomas G.; Lathrop, Richard H.; Lozano-Pérez, Tomás, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, 1997, 89 (1–2): 31–71, doi:10.1016/S0004-3702(96)00034-3 .

Amores, Jaume, Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence, 2013, 201: 81–105, doi:10.1016/j.artint.2013.06.003 .

Foulds, James; Frank, Eibe, A Review of Multi-Instance Learning Assumptions, Knowledge Engineering Review, 2010, 25 (1): 1–25, doi:10.1017/S026988890999035X .

Keeler, James D.; Rumelhart, David E.; Leow, Wee-Kheng, Integrated segmentation and recognition of hand-printed numerals, Proceedings of the 1990 Conference on Advances in Neural Information Processing Systems (NIPS 3), 1990: 557–563 .

Li, H.D.; Menon, R.; et al, The emerging era of genomic data integration for analyzing splice isoform function, Trends in Genetics, 2014, PMID 24951248, doi:10.1016/j.tig.2014.05.005, pii S0168-9525(14)00085-7 .

Eksi, R.; Li, H.D.; Menon, R.; et al, Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data, PLoS Comput Biol, 2013: Nov;9(11):e1003314, PMC 3820534 , PMID 24244129, doi:10.1371/journal.pcbi.1003314 .

Maron, O.; Ratan, A.L., Multiple-instance learning for natural scene classification, Proceedings of the Fifteenth International Conference on Machine Learning, 1998: 341–349 .

Ray, Soumya; Page, David. Multiple instance regression (PDF). ICML. 2001 [2015-07-10]. （原始內容 (PDF)存檔於2022-02-13）. .

[1] Minhas, Fayyaz. Multiple instance learning of Calmodulin binding sites,. Bioinformatics. 2012, 28 (18): i416-i422 [2015-07-10]. doi:10.1093/bioinformatics/bts416. （原始內容存檔於2015-03-04）.

[1]