DFKI-LT - Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs
Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs
1 Proceedings of 24th International Conference on Computational Linguistics,
A machine learning mechanism is learned from human annotations in order to perform preference ranking. The mechanism operates on a sentence level and ranks the alternative machine translations of each source sentence. Rankings are decomposed into pairwise comparisons so that binary classifiers can be trained using black-box features of automatic linguistic analysis. In order to re-compose the pairwise decisions of the classifier, this work introduces weighing the decisions with their classification probabilities, which eliminates ranking ties and increases the coefficient of the correlation with the human rankings up to 80\%. The authors also demonstrate several configurations of successful automatic ranking models; the best configuration achieves acceptable correlation with human judgments (tau=0.30), which is higher than that of state-of-the-art reference-aware automatic MT evaluation metrics such as METEOR and Levenshtein distance.
Files: BibTeX, C12-1008.pdf