Correlating decoding events with errors in Statistical Machine Translation

Eleftherios Avramidis, Maja Popovic

In: Rajeev Sangal, Jyoti Pawar, Dipti Misra Sharma (editor). Proceedings of the 11th International Conference on Natural Language Processing. International Conference on Natural Language Processing (ICON-2014) 11th December 18-21 Goa India Natural Language Processing Association, India 2014.


This work investigates situations in the decoding process of Phrase-based SMT that cause particular errors on the output of the translation. A set of translations post-edited by professional translators is used to automatically identify errors based on edit distance. Binary classifiers predicting the sentence-level existence of an error are fitted with Logistic Regression, based on features from the decoding search graph. Models are fitted for 3 common error types and 6 language pairs. The statistically significant coefficients of the logistic function are used to analyze parts of the decoding process that are related to the particular errors.


icon2014.pdf (pdf, 152 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz