Quality Estimation for Machine Translation output using linguistic analysis and decoding features

Eleftherios Avramidis

In: Proceedings of the Seventh Workshop on Statistical Machine Translation. Workshop on Statistical Machine Translation (WMT-12) befindet sich The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies June 7-8 Montreal Canada Association for Computational Linguistics 6/2012.


We describe a submission to the WMT12 Quality Estimation task, including an extensive Machine Learning experimentation. Data were augmented with features from linguistic analysis and statistical features from the SMT search graph. Several Feature Selection algorithms were employed. The Quality Estimation problem was addressed both as a regression task and as a discretised classification task, but the latter did not generalise well on the unseen testset. The most successful regression methods had an RMSE of 0.86 and were trained with a feature set given by Correlation-based Feature Selection. Indications that RMSE is not always sufficient for measuring performance were observed.


Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence