Improving Machine Translation through Linked Data

Ankit Srivastava, Georg Rehm, Felix Sasaki

In: Ondřej Bojar, Alexander M. Fraser, Lucia Specia, Mikel L. Forcada (editor). The Prague Bulletin of Mathematical Linguistics (PBML) 108 Pages 355-366 Charles University (Prague, Czech Republic) 6/2017.


With the ever increasing availability of linked multilingual lexical resources, there is a re- newed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translations are among the most common sources of error. In this pa- per, we attempt to minimise these types of errors by interfacing Statistical Machine Translation (SMT) models with Linked Open Data (LOD) resources such as DBpedia and BabelNet. We perform several experiments based on the SMT system Moses and evaluate multiple strategies for exploiting knowledge from multilingual linked data in automatically translating named en- tities. We conclude with an analysis of best practices for multilingual linked data sets in order to optimise their benefit to multilingual and cross-lingual applications.


Weitere Links

art-srivastava-rehm-sasaki-2.pdf (pdf, 496 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz