DFKI-LT - Morphemes and POS tags for n-gram based evaluation metrics

Maja Popovic
Morphemes and POS tags for n-gram based evaluation metrics
1 Proceedings of the Sixth Workshop on Statistical Machine Translation, Pages 104-107, Edinburgh, United Kingdom, Association for Computational Linguistics, 7/2011
 
We propose the use of morphemes for automatic evaluation of machine translation output, and systematically investigate a set of F score and BLEU score based metrics calculated on words, morphemes and POS tags along with all corresponding combinations. Correlations between the new metrics and human judgments are calculated on the data of the third, fourth and fifth shared tasks of the Statistical Machine Translation Workshop. Machine translation outputs in five different European languages are used: English, Spanish, French, German and Czech. The results show that the F scores which take into account morphemes and POS tags are the most promising metrics.
 
Files: BibTeX, ngrams.pdf