DFKI-LT - Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations

Christian Federmann
Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations
in: Daniel Tapias Mike Rosner Stelios Piperidis Jan Odjik Joseph Mariani Bente Maegaard Khalid Choukri Nicoletta Calzolari (Conference Chair) (ed.):
1 Proceedings of the Seventh conference on International Language Resources and Evaluation, Valletta, Malta, European Language Resources Association (ELRA), 5/2010
 
We describe a focused effort to investigate the performance of phrase-based,human evaluation of machine translation output achieving a high annotatoragreement. We define phrase-based evaluation and describe the implementation ofAppraise, a toolkit that supports the manual evaluation of machine translationresults. Phrase ranking can be done using either a fine-grained six-way scoringscheme that allows to differentiate between "much better" and "slightlybetter", or a reduced subset of ranking choices. Afterwards we discuss kappavalues for both scoring models from several experiments conducted with humanannotators. Our results show that phrase-based evaluation can be used for fastevaluation obtaining significant agreement among annotators. The granularity ofranking choices should, however, not be too fine-grained as this seems toconfuse annotators and thus reduces the overall agreement. The work reported inthis paper confirms previous work in the field and illustrates that the usageof human evaluation in machine translation should be reconsidered. The Appraisetoolkit is available as open-source and can be downloaded from the author'swebsite.
 
Files: BibTeX, 197_Paper.pdf