DFKI-LT - The Corpus and the Lexicon: Standardising Deep Lexical Acquisition Evaluation
The Corpus and the Lexicon: Standardising Deep Lexical Acquisition Evaluation
2 Proceedings of ACL 2007 Workshop on Deep Linguistic Processing,
This paper is concerned with the standardisation of evaluation metrics for lexical acquisition over precision grammars, which are attuned to actual parser performance. Specifically, we investigate the impact that lexicons at varying levels of lexical item precision and recall have on the performance of pre-existing broad-coverage precision grammars in parsing, i.e., on their coverage and accuracy. The grammars used for the experiments reported here are the LinGO English Resource Grammar (ERG; Flickinger (2000)) and JaCY (Siegel and Bender, 2002), precision grammars of English and Japanese, respectively. Our results show convincingly that traditional F-score-based evaluation of lexical acquisition does not correlate with actual parsing performance. What we argue for, therefore, is a recall-heavy interpretation of F-score in designing and optimising automated lexical acquisition algorithms.
Files: BibTeX, evaldla-final.pdf