Morphemes and POS tags for n-gram based evaluation metrics

Maja Popovic

In: Proceedings of the Sixth Workshop on Statistical Machine Translation. Workshop on Statistical Machine Translation (WMT-11) 6th befindet sich EMNLP July 30-31 Edinburgh United Kingdom Seiten 104-107 Association for Computational Linguistics 7/2011.


We propose the use of morphemes for automatic evaluation of machine translation output, and systematically investigate a set of F score and BLEU score based metrics calculated on words, morphemes and POS tags along with all corresponding combinations. Correlations between the new metrics and human judgments are calculated on the data of the third, fourth and fifth shared tasks of the Statistical Machine Translation Workshop. Machine translation outputs in five different European languages are used: English, Spanish, French, German and Czech. The results show that the F scores which take into account morphemes and POS tags are the most promising metrics.


ngrams.pdf (pdf, 34 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence