Investigating Genre and Method Variation in Translation Using Text Classification

Marcos Zampieri, Ekaterina Lapshinova-Koltunski

In: Proceedings of the 18th International Conference on Text, Speech and Dialogue (TSD2015). International Conference on Text, Speech and Dialogue (TSD) September 14-17 Pilzen Czech Republic Lecture Notes in Artificial Intelligence Springer 2015.


In this paper, we propose the use of automatic text classification methods to analyse variation in English-German translations from both a quantitative and a qualitative perspective. The experiments described in this paper are carried out in two steps. We trained classifiers to 1) discriminate between diferent genres (fiction, political essays, etc.); and 2) identify the translation method (machine vs. human). Using semi-delexicalized models (excluding all nouns), we report results of up to 60.5% F-measure in distinguishing human and machine translations and 45.4% in discriminating between seven diferent genres. More than the classification performance itself, we argue that text classification methods can level out discriminative features of diferent variables (genres and translation methods) thus enabling researchers to investigate in more detail the properties of each of them. (pdf, 56 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence