多示例学习

在机器学习中， 多示例学习 (MIL) 是由監督式學習演变而来的。相较于输入一系列被单独标注的示例，在多示例学习中，输入的是一系列被标注的“包”，每个“包”都包括许多示例。举一个二元分类的简单的例子，当包中的所有示例都是负例时，这个包会被标注为负包。另一方面，当包中至少含有一个正例时，这个包会被标注为正包。当收到一系列被标注的包时，机器试着去：（1）归纳出一个类别概念以便正确标注个别示例。（2）在归纳之外学习怎样去标注一个包。

就图像分类举一个例子：给出一个图像，我们想要根据图像的画面内容来确定它的目标类别。比如，当图像同时包括了“沙子”和“水”时，图像的目标类别可能是“海滩”。在多示例学习中，图像被描述成一个包： $X=\{X_{1},..,X_{N}\}$ , 其中每一个 $X_{i}$ 均是从图像中相应第i个区域中提取出来的特征向量（我们称之为示例），N是图像被分割出的区域（示例）个数。当图像包同时包含“沙子”区域示例和“水”区域示例时，这个包会被标注成正例（“海滩”）。

多示例学习这一名称最初是由Dietterich, Lathrop & Lozano-Pérez (1997)提出来的，但是类似更早的研究，有Keeler, Rumelhart & Leow (1990)的手写数字识别。最近关于多示例学习的回顾文献包括了Amores (2013)，对于不同的范式，它提供了一个广泛的回顾和比较研究。还有Foulds & Frank (2010)，对于文献中不同的范式所提出的不同假设，它提供了一个全面的回顾。

运用多示例学习的几个例子：

分子活性
钙调素结合蛋白结合位点的预测 ^[1]
对于选择性剪接异构体的预测作用 Li, Menon & et al. (2014),Eksi et al. (2013)
图像分类Maron & Ratan (1998)
文本或文档分类

数不清的研究都在做着促使传统分类技术，诸如支持向量机或是提升方法，适应于多示例学习环境的工作。

参见

Multi-label classification

参考资料

^ Minhas, Fayyaz. Multiple instance learning of Calmodulin binding sites,. Bioinformatics. 2012, 28 (18): i416-i422 [2015-07-10]. doi:10.1093/bioinformatics/bts416. （原始内容存档于2015-03-04）.

Dietterich, Thomas G.; Lathrop, Richard H.; Lozano-Pérez, Tomás, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, 1997, 89 (1–2): 31–71, doi:10.1016/S0004-3702(96)00034-3 .

Amores, Jaume, Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence, 2013, 201: 81–105, doi:10.1016/j.artint.2013.06.003 .

Foulds, James; Frank, Eibe, A Review of Multi-Instance Learning Assumptions, Knowledge Engineering Review, 2010, 25 (1): 1–25, doi:10.1017/S026988890999035X .

Keeler, James D.; Rumelhart, David E.; Leow, Wee-Kheng, Integrated segmentation and recognition of hand-printed numerals, Proceedings of the 1990 Conference on Advances in Neural Information Processing Systems (NIPS 3), 1990: 557–563 .

Li, H.D.; Menon, R.; et al, The emerging era of genomic data integration for analyzing splice isoform function, Trends in Genetics, 2014, PMID 24951248, doi:10.1016/j.tig.2014.05.005, pii S0168-9525(14)00085-7 .

Eksi, R.; Li, H.D.; Menon, R.; et al, Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data, PLoS Comput Biol, 2013: Nov;9(11):e1003314, PMC 3820534 , PMID 24244129, doi:10.1371/journal.pcbi.1003314 .

Maron, O.; Ratan, A.L., Multiple-instance learning for natural scene classification, Proceedings of the Fifteenth International Conference on Machine Learning, 1998: 341–349 .

Ray, Soumya; Page, David. Multiple instance regression (PDF). ICML. 2001 [2015-07-10]. （原始内容 (PDF)存档于2022-02-13）. .

[1] Minhas, Fayyaz. Multiple instance learning of Calmodulin binding sites,. Bioinformatics. 2012, 28 (18): i416-i422 [2015-07-10]. doi:10.1093/bioinformatics/bts416. （原始内容存档于2015-03-04）.

[1]