DFKI-LT - Lightly-Supervised Training for Hierarchical Phrase-Based Machine Translation

Matthias Huck, David Vilar Torres, Daniel Stein, Hermann Ney
Lightly-Supervised Training for Hierarchical Phrase-Based Machine Translation
1 The EMNLP 2011 Workshop on Unsupervised Learning in NLP, Edinburgh, United Kingdom, Association for Computational Linguistics, 7/2011
 
In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora. We explore different ways of using this additional data to improve our system. Our results show that integrating a second translation model with only non-hierarchical phrases extracted from the automatically generated bitexts is a reasonable approach. The translation performance matches the result we achieve with a joint extraction on all training bitexts while the system is kept smaller due to a considerably lower overall number of phrases.
 
Files: BibTeX, lightlySup.pdf