Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs

Eleftherios Avramidis

In: Proceedings of 24th International Conference on Computational Linguistics. International Conference on Computational Linguistics (COLING-12), December 8-15, Mumbai, India, Pages 115-132, The COLING 2012 Organizing Committee, 12/2012.


A machine learning mechanism is learned from human annotations in order to perform preference ranking. The mechanism operates on a sentence level and ranks the alternative machine translations of each source sentence. Rankings are decomposed into pairwise comparisons so that binary classifiers can be trained using black-box features of automatic linguistic analysis. In order to re-compose the pairwise decisions of the classifier, this work introduces weighing the decisions with their classification probabilities, which eliminates ranking ties and increases the coefficient of the correlation with the human rankings up to 80\%. The authors also demonstrate several configurations of successful automatic ranking models; the best configuration achieves acceptable correlation with human judgments (tau=0.30), which is higher than that of state-of-the-art reference-aware automatic MT evaluation metrics such as METEOR and Levenshtein distance.


Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence