DFKI-LT - Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output

Maja Popovic
Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output
1 The Prague Bulletin of Mathematical Linguistics volume 96, Pages 59-68, Charles University, Prague, 10/2011
 
We describe Hjerson, a tool for automatic classification of errors in machine translation output. The tool features the detection of five word level error classes: morphological errors, reordering errors, missing words, extra words and lexical errors. As input, the tool requires original full form reference translation(s) and hypothesis along with their corresponding base forms. It is also possible to use additional information on the word level (e.g. POS tags) in order to obtain more details. The tool provides the raw count and the normalised score (error rate) for each error class at the document level and at the sentence level, as well as original reference and hypothesis words labelled with the corresponding error class in text and HTML formats.
 
Files: BibTeX, MAIN.pdf