Publication

Parsing Hindi with MDParser

Alexander Volokh; Günter Neumann

In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012) (Hrsg.). Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages . Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), located at COLING-2012, December 8-15, IIT Bombay, Mumbai, India, Coling, 2012.

Abstract

We describe our participation in the MTPIL Hindi Parsing Shared Task-2012. Our system achieved the following results: 82.44% LAS/90.91% UAS (auto) and 85.31% LAS/92.88% UAS (gold). Our parser is based on the linear classification, which is suboptimal as far as the accuracy is concerned. The strong point of our approach is its speed. For parsing development the system requires 0.935 seconds, which corresponds to a parsing speed of 1318 sentences per second. The Hindi Treebank contains much less different part of speech tags than many other treebanks and therefore it was absolutely necessary to use the additional morphosyntactic features available in the treebank. We were able to build classifiers predicting those, using only the standard word form and part of speech features, with a high accuracy.

W12-5615.pdf (pdf, 149 KB )