Exploiting XLE's finite state interface in LFG-based statistical machine translation

Eleftherios Avramidis, Jonas Kuhn

In: Miriam Butt, Tracy Holloway King (editor). Proceedings of International Lexical Functional Grammar Conference 2009. International Lexical Functional Grammar Conference (LFG-09) July 13-16 Cambridge United Kingdom Pages 127-145 2009 CSLI Publications CSLI Publications 12/2009.


We present the addition of a morphological generation component to an LFG-based Statistical Machine Translation System, taking advantage of existing morphological grammars and the FST (Finite State Transducer) processing pipeline of the XLE system. The extended syntax-driven translation system takes separate stochastic decisions for lemmata and morphological tags; the role of finite-state morphological grammars is to generate full forms out of a bundle of morphological tags produced by the translation component. This technique can lead to a more effective use of a given amount of training data from a parallel corpus, since lexical vs. morphosyntactic translation patterns can be induced independently. The existing FST processing cascade for German, when added to the Statistical Machine Translation System, suffers from generation failures. These occur due to overgeneralisation by the syntax-driven translation process and originate from (i) the use of various underspecification tags in the morphological grammar, or (ii) erroneous assignment of certain tags to a given lemma. In order to deal with this, we add a set of replacement/correction rules on top of the cascade. The augmented FST cascade leads to an increase of generation coverage from 47.90% to 75.35%. A detailed error analysis for the remaining 24.65% is given.

Weitere Links

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz