Mining Natural Language Answers from the Web

Günter Neumann; Feiyu Xu

In: International Journal of Web Intelligence and Agent Systems, 2004.


We present a novel method for mining textual answers in Web pages using semi-structured NL questions and Google for initial document retrieval. We exploit the redundancy on the Web by weighting all identified named entities (NEs) found in the relevant document set based on their occurrences and distributions. The ranked NEs are used as our primary anchors for document indexing, paragraph selection, and answer identification. The latter is dependent on two factors: the overlap of terms at different levels (e.g., tokens and named entities) between queries and sentences, and the relevance of identified NEs corresponding to the expected answer type. The set of answer candidates is further structured into ranked equivalent classes from which the final answer is selected. The system has been evaluated using question-answer pairs extracted from a popular German quiz book.

NeumannXUWI2004Journal.pdf (pdf, 176 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence