DFKI-LT - A Corpus-Driven Context-Free Approximation of Head-Driven Phrase Structure Grammar
A Corpus-Driven Context-Free Approximation of Head-Driven Phrase Structure Grammar
1 Proceedings of 7th International Colloquium on Grammatical Inference (ICGI), o.A., 2004
We present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars, such as HPSG or PATR-II, into context-free grammars (CFGs). The method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in Java. We report on first experiments with four different grammars.