Aligning mildly context-sensitive formalism for data-driven parsing

Ilya Kashkarev

Mastersthesis Saarland University 2012.


In this thesis we touched upon different properties of such formalisms as HPSG, LCFRS, and ACG. On the one side we have a more linguistically aware HPSG, whose mechanisms allow to process complex language phe- nomena. It produces very accurate parses but the coverage requires a lot of effort to improve. At the same time, it was only recently that HPSG was subjected to modification allowing for parsing dicontinuities in a direct way, which, however, results in a significant slow-down.There is a class of mildely context-sensitive formalisms that are considered as appropriate for modeling natural languages. Among them we find LCFRS and ACG, which in fact were proved to be weakly equivalent. These formalisms handle discontinuities with realtive ease, but lack means to ex- press other linguistic generalizations. It seems possible that establishing strict theortical connection between HPSG and ACG is profitable for both formalisms including LCFRS. We made the first step towards revealing these connections. In the second, practical part of this research we looked deeper into the intricacies of the LCFRS implementation.The recently developed methods of reading a grammar facilitate research in this area with automatically extracted large scale grammars. Our implementation achieves state-of-the-art accuracy results and even better, because of various simplifications that we make. The experiments obviously point out at the underrepresintation of dis- continuities in the corpus.


kashkarev-thesis.pdf (pdf, 1 MB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz