DFKI-LT - Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation

Eva Martínez Garcia, Carles Creus, Cristina España i Bonet, Lluís Màrquez
Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation
1 The Prague Bulletin of Mathematical Linguistics volume 108, Pages 85-96, DE GRUYTER OPEN, Warsaw, Poland, 6/2017
 
We integrate new mechanisms in a document-level machine translation decoder to improve the lexical consistency of document translations. First, we develop a document-level feature designed to score the lexical consistency of a translation. This feature, which applies to words that have been translated into different forms within the document, uses word embeddings to measure the adequacy of each word translation given its context. Second, we extend the decoder with a new stochastic mechanism that, at translation time, allows to introduce changes in the translation oriented to improve its lexical consistency. We evaluate our system on English–Spanish document translation, and we conduct automatic and manual assessments of its quality. The automatic evaluation metrics, applied mainly at sentence level, do not reflect significant variations. On the contrary, the manual evaluation shows that the system dealing with lexical consistency is preferred over both a standard sentence-level and a standard document-level phrase-based MT systems.
 
Files: BibTeX, art-garcia-espana-bonet-marquez-creus.pdf, art-garcia-espana-bonet-marquez-creus.pdf