Skip to main content Skip to main navigation


Large-Scale Corpus-Driven PCFG Approximation of an HPSG

Yi Zhang; Hans-Ulrich Krieger
In: 12th International Conference on Parsing Technologies. International Conference on Parsing Technologies (IWPT-2011), located at Hauptkonferenz, October 5-7, Dublin, Ireland, SigPARSE, 2011.


We present a novel corpus-driven approach towards grammar approximation for a linguistically deep Head-driven Phrase Structure Grammar. With an unlexicalized probabilistic context-free grammar obtained by Maximum Likelihood Estimate on a large-scale automatically annotated corpus, we are able to achieve parsing accuracy higher than the original HPSG-based model. Different ways of enriching the annotations carried by the approximating PCFG are proposed and compared. Comparison to the state-of-the-art latent-variable PCFG shows that our approach is more suitable for the grammar approximation task where training data can be acquired automatically. The best approximating PCFG achieved ParsEval F$_1$ accuracy of 84.13\%. The high robustness of the PCFG suggests it is a viable way of achieving full coverage parsing with the hand-written deep linguistic grammars.