Sentence-level quality estimation by predicting HTER as a multi-component metric

Eleftherios Avramidis

In: Second Conference on Machine Translation. Workshop on Statistical Machine Translation (WMT-17) located at The 2017 Conference on Empirical Methods in Natural Language Processing September 7-8 Copenhagen Denmark Association for Computational Linguistics 9/2017.


This submission investigates alternative machine learning models for predicting the HTER score on the sentence level. Instead of directly predicting the HTER score, we suggest a model that jointly predicts the amount of the 4 distinct post-editing operations, which are then used to calculate the HTER score. This also gives the possibility to correct invalid (e.g. negative) predicted values prior to the calculation of the HTER score. Without any feature exploration, a multi-layer perceptron with 4 outputs yields small but significant improvements over the baseline.


Weitere Links

Sentence-level_quality_estimation_by_predicting_HTER_as_a_multi-component_metric.pdf (pdf, 134 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz