DFKI-LT - From UBGs to CFGs. A Practical Corpus-Driven Approach.

Hans-Ulrich Krieger
From UBGs to CFGs. A Practical Corpus-Driven Approach.
1 volume RR-04-01,
Research Report, DFKI, 2004

We present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars (UBGs), such as HPSG, CLE, or PATR-II into context-free grammars (CFGs). The method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in Java.
Files: BibTeX, RR-04-01.ps