DFKI-LT - Learning from human judgments of machine translation output

Maja Popovic, Eleftherios Avramidis, Aljoscha Burchardt, Sabine Hunsicker, Sven Schmeier, Cindy Tscherwinka, David Vilar, Hans Uszkoreit
Learning from human judgments of machine translation output
6 Proceedings of the MT Summit XIV, Nice, France, The European Association for Machine Translation, Allschwil / Switzerland, 2013
 
Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. Usually, human judgments come in the form of ranking outputs of different translation systems and recently, post-edits of MT output have come into focus. This paper describes the results of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing. Translation outputs from three domains and six translation directions generated by five distinct translation systems have been analysed with the goal of getting relevant insights for further improvement of MT quality and applicability.
 
Files: BibTeX, evaluation.pdf