DFKI-LT - Large-Scale Corpus-Driven PCFG Approximation of an HPSG
Large-Scale Corpus-Driven PCFG Approximation of an HPSG
2 12th International Conference on Parsing Technologies, Dublin, Ireland, SigPARSE, 2011
We present a novel corpus-driven approach towards grammar approximation for a linguistically deep Head-driven Phrase Structure Grammar. With an unlexicalized probabilistic context-free grammar obtained by Maximum Likelihood Estimate on a large-scale automatically annotated corpus, we are able to achieve parsing accuracy higher than the original HPSG-based model. Different ways of enriching the annotations carried by the approximating PCFG are proposed and compared. Comparison to the state-of-the-art latent-variable PCFG shows that our approach is more suitable for the grammar approximation task where training data can be acquired automatically. The best approximating PCFG achieved ParsEval F$_1$ accuracy of 84.13\%. The high robustness of the PCFG suggests it is a viable way of achieving full coverage parsing with the hand-written deep linguistic grammars.