A Test Suite for the Evaluation of Portuguese-English Machine Translation

Mariana Avelino; Vivien Macketanz; Eleftherios Avramidis; Sebastian Möller

In: Vládia Pinheiro; Pablo Gamallo; Raquel Amaro; Carolina Scarton; Fernando Batista; Diego Silva; Catarina Magro; Hugo Pinto (Hrsg.). 15th International Conference of Computational Processing of the Portuguese Language. Computational Processing of the Portuguese Language (PROPOR-2022), March 21-23, Fortaleza, Brazil, Pages 15-25, ISBN 978-3-030-98305-5, Springer International Publishing, 3/2022.


This paper describes the development of the first test suite for the language direction Portuguese-English. Designed for fine-grained linguistic analysis, the test suite comprises 330 test sentences for 66 linguistic phenomena and 14 linguistic categories. Eight different MT systems were compared using quantitative and qualitative methods via the test suite: DeepL, Google Sheets, Google Translator, Microsoft Translator, Reverso, Systran, Yandex and an internally built NMT system trained over 30 h on 2,5M sentences. It was found that ambiguity, named entity & terminology and verb valency are the categories where MT systems struggle most. Negation, pronouns, subordination, verb tense/aspect/mood and false friends are the categories where MT systems perform best.


Weitere Links

Avelino_et_al_2022_-_Test_Suite_Evaluation_Portuguese-English_Machine_Translation_-_PROPOR22.pdf (pdf, 190 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence