Skip to main content Skip to main navigation


Enriching input in Statistical Machine Translation

Eleftherios Avramidis
Mastersthesis, School of Informatics, University of Edinburgh, 8/2007.


Statistical Machine Translation has problems dealing with morphologically rich languages; translating from English into these gives a significantly worse quality. We make an effort to address this problem by adding per-word linguistic information into the source language of the translation task. We use the syntax of the source sentence so as to extract information for noun cases, verb persons and attribute genders and annotate these words accordingly. The solution is tested on factored phrase-based models, giving indications that the methods proposed are useful. Manual error analysis shows that the translation of the words annotated (nouns and verbs) improves, but a problem of sparse data is caused. Experiments managed to get a small improvement on NIST metric while human evaluation showed that a model combining both noun cases and verb persons has increased the adequacy (meaning) and deteriorated the fluency of the generated translation.