Publikation

From UBGs to CFGs. A Practical Corpus-Driven Approach.

Hans-Ulrich Krieger

Research Report DFKI RR-04-01 2004.

Abstrakt

We present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars (UBGs), such as HPSG, CLE, or PATR-II into context-free grammars (CFGs). The method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in Java.

RR-04-01.ps (ps, 973 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence