Lightly-Supervised Training for Hierarchical Phrase-Based Machine Translation

Matthias Huck, David Vilar Torres, Daniel Stein, Hermann Ney

In: The EMNLP 2011 Workshop on Unsupervised Learning in NLP. Workshop on Unsupervised Learning in NLP July 30 Edinburgh United Kingdom Association for Computational Linguistics 7/2011.


In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora. We explore different ways of using this additional data to improve our system. Our results show that integrating a second translation model with only non-hierarchical phrases extracted from the automatically generated bitexts is a reasonable approach. The translation performance matches the result we achieve with a joint extraction on all training bitexts while the system is kept smaller due to a considerably lower overall number of phrases.

lightlySup.pdf (pdf, 88 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence