Joint Grammar and Treebank Development for Mandarin Chinese with HPSG

Yi Zhang; Rui Wang; Yu Chen

In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). International Conference on Language Resources and Evaluation (LREC-2012), May 23-25, Istanbul, Turkey, ISBN 978-2-9517408-7-7, European Language Resources Association (ELRA), 5/2012.


We present the ongoing development of MCG, a linguistically deep and precise grammar for Mandarin Chinese together with its accompanying treebank, both based on the linguistic framework of HPSG, and using MRS as the semantic representation. We highlight some key features of our grammar design, and review a number of challenging phenomena, with comparisons to alternative linguistic treatments and implementations. One of the distinguishing characteristics of our approach is the tight integration of grammar and treebank development. The two-step treebank annotation procedure benefits from the efficiency of the discriminant-based annotation approach, while giving the annotators full freedom of producing extra-grammatical structures. This not only allows the creation of a precise and full-coverage treebank with an imperfect grammar, but also provides prompt feedback for grammarians to identify the errors in the grammar design and implementation. Preliminary evaluation and error analysis shows that the grammar already covers most of the core phenomena for Mandarin Chinese, and the treebank annotation procedure reaches a stable speed of 35 sentences per hour with satisfying quality.


Weitere Links

345_Paper.pdf (pdf, 488 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence