The Corpus and the Lexicon: Standardising Deep Lexical Acquisition Evaluation

Yi Zhang; Timothy Baldwin; Valia Kordoni

In: Proceedings of ACL 2007 Workshop on Deep Linguistic Processing. ACL Workshop on Deep Linguistic Processing, Prague, Czech Republic, Pages 152-159, 6/2007.


This paper is concerned with the standardisation of evaluation metrics for lexical acquisition over precision grammars, which are attuned to actual parser performance. Specifically, we investigate the impact that lexicons at varying levels of lexical item precision and recall have on the performance of pre-existing broad-coverage precision grammars in parsing, i.e., on their coverage and accuracy. The grammars used for the experiments reported here are the LinGO English Resource Grammar (ERG; Flickinger (2000)) and JaCY (Siegel and Bender, 2002), precision grammars of English and Japanese, respectively. Our results show convincingly that traditional F-score-based evaluation of lexical acquisition does not correlate with actual parsing performance. What we argue for, therefore, is a recall-heavy interpretation of F-score in designing and optimising automated lexical acquisition algorithms.

evaldla-final.pdf (pdf, 84 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence