Efforts on Machine Learning over Human-mediated Translation Edit Rate

Eleftherios Avramidis

In: Proceedings of the Ninth Workshop on Statistical Machine Translation. Workshop on Statistical Machine Translation (WMT-14), located at The 52nd Annual Meeting of the Association for Computational Linguistics , June 26-27, Baltimore, Maryland, USA, Pages 302-306, Association for Computational Linguistics, 6/2014.


In this paper we describe experiments on predicting HTER, as part of our submission in the Shared Task on Quality Estimation, in the frame of the 9th Workshop on Statistical Machine Translation. In our experiment we check whether it is possible to achieve better HTER prediction by training four individual regression models for each one of the edit types (deletions, insertions, substitutions, shifts), however no improvements were yielded. We also had no improvements when investigating the possibility of adding more data from other non-minimally post-edited and freely translated datasets. Best HTER prediction was achieved by adding deduplicated WMT13 data and additional features such as (a) rule-based language corrections (language tool) (b) PCFG parsing statistics and count of tree labels (c) position statistics of parsing labels (d) position statistics of tri-grams with low probability.

document.pdf (pdf, 82 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence