Skip to main content Skip to main navigation


Challenging the State-of-the-art Machine Translation Metrics from a Linguistic Perspective

Eleftherios Avramidis; Shushen Manakhimova; Vivien Macketanz; Sebastian Möller
In: Philipp Koehn; Barry Haddow; Tom Kocmi; Christof Monz (Hrsg.). Proceedings of the Eighth Conference on Machine Translation. Conference on Machine Translation (WMT-23), located at EMNL2023, December 6-7, Singapore, Pages 713-729, Association for Computational Linguistics, 2023.


We employ a linguistically motivated challenge set in order to evaluate the state-of-the-art machine translation metrics submitted to the Metrics Shared Task of the 8th Conference for Machine Translation. The challenge set includes about 21,000 items extracted from 155 machine translation systems for three language directions, covering more than 100 linguistically-motivated phenomena organized in 14 categories. The metrics that have the best performance with regard to our linguistically motivated analysis are the Cometoid22-wmt23 (a trained metric based on distillation) for German-English and MetricX-23-c (based on a fine-tuned mT5 encoder-decoder language model) for English-German and English-Russian. Some of the most difficult phenomena are passive voice for German-English, named entities, terminology and measurement units for English-German, and focus particles, adverbial clause and stripping for English-Russian.


Weitere Links