DFKI-LT - Enriching input in Statistical Machine Translation
Enriching input in Statistical Machine Translation
1 Mastersthesis, School of Informatics, University of Edinburgh, Edinburgh, Scotland, 8/2007,
The authors and The University of Edinburgh retain the right to reproduce and publish these theses for non-commercial purposes. Permission is granted for these theses to be reproduced by others for non-commercial purposes
Statistical Machine Translation has problems dealing with morphologically rich languages; translating from English into these gives a significantly worse quality. We make an effort to address this problem by adding per-word linguistic information into the source language of the translation task. We use the syntax of the source sentence so as to extract information for noun cases, verb persons and attribute genders and annotate these words accordingly. The solution is tested on factored phrase-based models, giving indications that the methods proposed are useful. Manual error analysis shows that the translation of the words annotated (nouns and verbs) improves, but a problem of sparse data is caused. Experiments managed to get a small improvement on NIST metric while human evaluation showed that a model combining both noun cases and verb persons has increased the adequacy (meaning) and deteriorated the fluency of the generated translation.
Files: BibTeX, infthesis-template.pdf