DFKI-LT - Dissertation Series


Aljoscha Burchardt: Modeling Textual Entailment with Role-Semantic Information

ISBN: 978-3-933218-28-5
255 pages
price: € 15

order form

In this thesis, we present a novel approach for modeling textual entailment using lexical-semantic information on the level of predicate-argument structure. To this end, we adopt information provided by the Berkeley FrameNet repository and embed it into an implemented end-to-end system. The two main goals of this thesis are the following:

(i) to provide an analysis of the potential contribution of frame semantic information to the recognition textual entailment and
(ii) to present a robust system architecture that can serve as basis for future experiments, research, and improvement.

Our work was carried out in the context of the textual entailment initiative, which since 2005 has set the stage for the broad investigation of inference in natural-language processing tasks, including empirical evaluation of its coverage and reliability. In short, textual entailment describes inferential relations between (entailing) texts and (entailed) hypotheses as interpreted by typical language users. This pre-theoretic notion captures a natural range of inferences as compared to logical entailment, which has traditionally been used within theoretical approaches to natural language semantics.

Various methods for modeling textual entailment have been proposed in the literature, ranging from shallow techniques like lexical overlap to shallow syntactic parsing and the exploitation of WordNet relations. Recently, there has been a move towards more structured meaning representations. In particular, the level of predicate-argumentstructure has gained much attention, which seems to be a natural and straightforwardchoice. Predicate-argument structure allows annotating sentences or texts with nuclear meaning representations ("who did what to whom"), which are of obvious relevance for this task. For example, it can account for paraphrases like "Ghosts scare John" vs. "John is scared by ghosts".

In this thesis, we present an approach to textual entailment that is centered around the analysis of predicate-argument structure. It combines LFG grammatical analysis, predicate-argument structure in the FrameNet paradigm, and taxonomic information from WordNet into tripartite graph structures. By way of a declarative graph matching algorithm, the "structural and semantic" similarity of hypotheses and texts is computed and the result is represented as feature vectors. A supervised machine learning architecture trained on entailment corpora is used to check textual entailment for new text/hypothesis pairs. The approach is implemented in the SALSA RTE system, which successfully participated in the second and third RTE challenge.

While system performance is on a par with that of comparable systems, the intuitively expected strong positive effect of using FrameNet information has not yet been confirmed. In order to evaluate different system components and to assess the potential contribution of FrameNet information for checking textual entailment, we conducted a number of experiments. For example, with the help of a gold-standard corpus, we iii experimentally analyzed different factors that can limit the applicability of frame semantics in checking textual entailment, ranging from issues related to resource coverage to knowledge modeling problems.