Comparative Quality Estimation for Machine Translation: An Application of Artificial Intelligence on Language Technology using Machine Learning of Human Preferences

Eleftherios Avramidis

PhD-Thesis Universität des Saarlandes SciDok - Elektronische Dokumente der Universität des Saarlandes Saarbrücken 2019.


In this thesis we focus on Comparative Quality Estimation, as the automaticprocess of analysing two or more translations produced by a Machine Translation(MT) system and expressing a judgment about their comparison. We approach theproblem from a supervised machine learning perspective, with the aim to learnfrom human preferences. As a result, we create the ranking mechanism, a pipelinethat includes the necessary tasks for ordering several MT outputs of a givensource sentence in terms of relative quality. Quality Estimation models are trained to statistically associate the judgmentswith some qualitative features. For this purpose, we design a broad set offeatures with a particular focus on the ones with a grammatical background.Through an iterative feature engineering process, we investigate several featuresets, we conclude to the ones that achieve the best performance and we proceedto linguistically intuitive observations about the contribution of individualfeatures. Additionally, we employ several feature selection and machine learning methodsto take advantage of these features. We suggest the usage of binary classifiersafter decomposing the ranking into pairwise decisions. In order to reduce theamount of uncertain decisions (ties) we weight the pairwise decisions with theirclassification probability. Through a set of experiments, we show that the ranking mechanism can learn andreproduce rankings that correlate to the ones given by humans. Most importantly,it can be successfully compared with state-of-the-art reference-aware metricsand other known ranking methods for several language pairs. We also apply thismethod for a hybrid MT system combination and we show that it is able to improvethe overall translation performance. Finally, we examine the correlation between common MT errors and decoding eventsof the phrase-based statistical MT systems. Through evidence from the decodingprocess, we identify some cases where long-distance grammatical phenomena cannotbe captured properly. An additional outcome of this thesis is the open source software Qualitative,which implements the full pipeline of ranking mechanism and the systemcombination task. It integrates a multitude of state-of-the-art natural languageprocessing tools and can support the development of new models. Apart from theusage in experiment pipelines, it can serve as an application back-end for webapplications in real-use scenaria.


diss_equalborders.pdf (pdf, 1 MB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz