Forest to string based statistical machine translation with hybrid word alignments

Santanu Pal; Sudip Kumar Naskar; Josef van Genabith

In: Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science. International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2016), April 3-9, Konya, Turkey, Pages 38-50, Vol. 9624, Springer, 2016.


Forest to String Based Statistical Machine Translation (FSBSMT) is a forest-based tree sequence to string translation model for syntax based statistical machine translation. The model automatically learns tree sequence to string translation rules from a given word alignment estimated on a source-side-parsed bilingual parallel corpus. This paper presents a hybrid method which combines different word alignment methods and integrates them into an FSBSMT system. The hybrid word alignment provides the most informative alignment links to the FSBSMT system. We show that hybrid word alignment integrated into various experimental settings of FSBSMT provides considerable improvement over state-of-the-art Hierarchical Phrase based SMT (HPBSMT). The research also demonstrates that additional integration of Named Entities (NEs), their translations and Example Based Machine Translation (EBMT) phrases (all extracted from the bilingual parallel training data) into the system brings about further considerable performance improvements over the hybrid FSBSMT system. We apply our hybrid model to a distant language pair, English–Bengali. The proposed system achieves 78.5% relative (9.84 BLEU points absolute) improvement over baseline HPBSMT.

Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence