Skip to main content Skip to main navigation


Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating

Hongfei Xu; Deyi Xiong; Josef van Genabith; Qiuhui Liu
In: Christian Bessiere (Hrsg.). Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. International Joint Conference on Artificial Intelligence (IJCAI-2020), 29th International Joint Conference on Artificial Intelligence and 17th Pacific Rim International Conference on Artificial Intelligence, located at IJCAI-PRICAI 2020, January 5-10, Online, Pages 3933-3940, ISBN 978-0-9992411-6-5, IJCAI, 7/2020.


Existing Neural Machine Translation (NMT) systems are generally trained on a large amount of sentence-level parallel data, and during prediction sentences are independently translated, ignoring cross-sentence contextual information. This leads to inconsistency between translated sentences. In order to address this issue, context-aware models have been proposed. However, document-level parallel data constitutes only a small part of the parallel data available, and many approaches build context-aware models based on a pre-trained frozen sentence-level translation model in a two-step training manner. The computational cost of these approaches is usually high. In this paper, we propose to make the most of layers pre-trained on sentence-level data in contextual representation learning, reusing representations from the sentence-level Transformer and significantly reducing the cost of incorporating contexts in translation. We find that representations from shallow layers of a pre-trained sentence-level encoder play a vital role in source context encoding, and propose to perform source context encoding upon weighted combinations of pre-trained encoder layers' outputs. Instead of separately performing source context and input encoding, we propose to iteratively and jointly encode the source input and its contexts and to generate input-aware context representations with a cross-attention layer and a gating mechanism, which resets irrelevant information in context encoding. Our context-aware Transformer model outperforms the recent CADec [Voita et al., 2019c] on the English-Russian subtitle data and is about twice as fast in training and decoding.


Weitere Links