Skip to main content Skip to main navigation

Publication

Learning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation

Pranava Swaroop Madhyastha; Cristina España-Bonet
In: Proceedings of the 2nd Workshop on Representation Learning for NLP. ACL Workshop on Representation Learning for NLP (RepL4NLP-17), 2nd, located at 55th Annual Meeting of the Association for Computational Linguistics (ACL), August 3, Vancouver, BC, Canada, Pages 139-145, Association for Computational Linguistics, 8/2017.

Abstract

We propose a simple log-bilinear softmax-based model to deal with vocabulary expansion in machine translation. Our model uses word embeddings trained on significantly large unlabelled monolingual corpora and learns over a fairly small, word-to-word bilingual dictionary. Given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language using the trained bilingual embeddings. We integrate these translation options into a standard phrase-based statistical machine translation system and obtain consistent improvements in translation quality on the English–Spanish language pair. When tested over an out-of-domain test-set, we get a significant improvement of 3.9 BLEU points

Weitere Links