Answering Definition Questions: Dealing with Data Sparseness in Lexicalised Dependency Trees-Based Language Models

Alejandro Figueroa, John Atkinson

In: Jose Cordeiro, Joaquim Filipe. Web Information Systems and Technologies 5th International Conference - Revised papers. Chapter 22 Pages 297-310 Lecture Notes in Business Information Processing (LNBIP) 45 ISBN 978-3-642-12435-8 Springer Berlin Heidelberg 2010.


A crucial step in the answering process of definition questions, such as "Who is Gordon Brown?" is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.


lnbip_v1.pdf (pdf, 179 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz