莱葩

莱葩（英文名LEPOR）是自动评测翻译质量的计算模型。最早公布的版本始于2012年的机选语言学年度会议COLING。此评测模型不依赖于具体语言，可以方便应用于多种语言。此模型设计包含较丰富的评测指标因素以及可调参数。参数设置是为了满足不同话题和不同语种下的相关需求。

背景研究

自从IBM研究团队提出自动翻译评测模型BLEU ^[1] 之后，自动翻译评测得到很多广泛应用，因为其耗时少并且花费少，方便快捷，对机器翻译系统的开发工作提供很大的推动作用。榆次同时，很多科研人员对此模型进行了分析，发现很多不足之处。陆续出现的自动评测模型包括METEOR,^[2]等。但是目前的评测模型仍然面临很多不足之处，比如，评测指标设计不够完整，导致评测不准确，评测偏差；以及，使用过多的语言学特征来提高评测质量，但是导致评测结果不易重复。莱葩评测工具^[3]的提出是基于对存在问题的研究，努力克服这些议题，设置广泛的评测因子提高评测准确性，减少语言学特征来使得评测过程可重复性好。^[4]

模型设计

莱葩翻译评测模型包含三个主要指标：长度惩罚，位移惩罚，准确度和召回率。莱葩修改BLEU的长度惩罚因子，对简短和冗长句子都进行惩罚系数设置。基于前人研究工作的位移惩罚设置^[5]，莱葩加入n元词对齐的语言学上下文考虑。准确率和召回率是反应翻译质量输出准确性和对原文忠实度的重要指标。

模型表现

莱葩模型在国际机器翻译年度会议WMT的从属比赛上取得优秀的表现 ACL （页面存档备份，存于互联网档案馆） (ACL-WMT （页面存档备份，存于互联网档案馆）). 在年度比赛ACL-WMT 2013,^[6]中，莱葩对英语到其他语言的翻译评测最接近人工评测，使用Pearson先关系数，取得五个语言对平均（英-法，英语-西班牙语，英语-捷克语，英语-德语，英语-俄语）相关分数排名第一。在英语为目标语言的评测上另一个评测模型METEOR取得第一。跟进的使用词性标注信息的莱葩评测模型表现出更好的分数^[7]

注释

^ Papineni et al., (2002)
^ Banerjee and Lavie, (2005)
^ Han et al., (2013a)
^ Han et al., (2014)
^ Wong and Kit, (2008)
^ ACL-WMT (2013)
^ Han (2014)

参考文献

Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. (2002). "BLEU: a method for automatic evaluation of machine translation" in ACL-2002: 40th Annual meeting of the Association for Computational Linguistics pp. 311–318
Han, A.L.F., Wong, D.F., and Chao, L.S. (2012) "LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors" in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pp. 441–450. Mumbai, India. Online paper （页面存档备份，存于互联网档案馆） Open source tool （页面存档备份，存于互联网档案馆）
Han, A.L.F., Wong, D.F., Chao, L.S., He, L., Lu, Y., Xing, J., and Zeng, X. (2013a) "Language-independent Model for Machine Translation Evaluation with Reinforced Factors" in Proceedings of the Machine Translation Summit XIV (MT SUMMIT 2013), pp. 215-222. Nice, France. Publisher: International Association for Machine Translation. Online paper （页面存档备份，存于互联网档案馆） Open source tool （页面存档备份，存于互联网档案馆）
Han, A.L.F., Wong, D.F., Chao, L.S., Lu, Y., He, L., Wang, Y., and Zhou, J. (2013b) "A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task" in Proceedings of the Eighth Workshop on Statistical Machine Translation, ACL-WMT13, Soﬁa, Bulgaria. Association for Computational Linguistics. Online paper （页面存档备份，存于互联网档案馆） pp. 414–421
Han, A.L.F., Wong, D.F., Chao, L.S., He, L., and Lu, Y. (2014) "Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation" in The Scientific World Journal. Issue: Recent Advances in Information Technology. ISSN 1537-744X. Hindawi Publishing Corporation. Online paper
ACL-WMT. (2013) "ACL-WMT13 METRICS TASK （页面存档备份，存于互联网档案馆）"
Wong, B. T-M, and Kit, C. (2008). "Word choice and word position for automatic MT evaluation" in Workshop: MetricsMATR of the Association for Machine Translation in the Americas (AMTA), short paper, Waikiki, US.
Banerjee, S. and Lavie, A. (2005) "METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments" in Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005
Han, Lifeng. (2014) "LEPOR: An Augmented Machine Translation Evaluation Metric". Thesis for Master of Science in Software Engineering. University of Macau, Macao. Thesis （页面存档备份，存于互联网档案馆） PPT （页面存档备份，存于互联网档案馆）

软件链接

[1] Papineni et al., (2002)

[2] Banerjee and Lavie, (2005)

[3] Han et al., (2013a)

[4] Han et al., (2014)

[5] Wong and Kit, (2008)

[6] ACL-WMT (2013)

[7] Han (2014)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

背景研究

模型设计

模型表现

相关链接

注释

参考文献

软件链接