Skip to main content Skip to main navigation


TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality

Vivien Macketanz; Renlong Ai; Aljoscha Burchardt; Hans Uszkoreit
In: Nicoletta Calzolari; Khalid Choukri; Christopher Cieri; Thierry Declerck; Sara Goggi; Koiti Hasida; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Hélène Mazo; Asuncion Moreno; Jan Odijk; Stelios Piperidis; Takenobu Tokunaga (Hrsg.). Proceedings of the Eleventh International Conference on Language Resources and Evaluation. International Conference on Language Resources and Evaluation (LREC-2018), 11th, May 7-12, Miyazaki, Japan, European Language Resources Association (ELRA), 2018.


In several areas of NLP evaluation, test suites have been used to analyze the strengths and weaknesses of systems. Today, Machine Translation (MT) quality is usually assessed by shallow automatic comparisons of MT outputs with reference corpora resulting in a number. Especially the trend towards neural MT has renewed peoples’ interest in better and more analytical diagnostic methods for MT quality. In this paper we present TQ-AutoTest, a novel framework that supports a linguistic evaluation of (machine) translations using test suites. Our current test suites comprise about 5000 handcrafted test items for the language pair German–English. The framework supports the creation of tests and the semi-automatic evaluation of the MT results using regular expressions. The expressions help to classify the results as correct, incorrect or as requiring a manual check. The approach can easily be extended to other NLP tasks where test suites can be used such as evaluating (one-shot) dialogue systems.


Weitere Links