DFKI-LT - Dissertation Series


Rui Wang: Intrinsic and Extrinsic Approaches to Recognizing Textual Entailment

ISBN: 978-3-933218-32-2
219 pages
price: € 17

order form

Recognizing Textual Entailment (RTE) is to detect an important relation between two texts, namely whether one text can be inferred from the other. For natural language processing, especially for natural language understanding, this is a useful and challenging task. We start with an introduction of the notion of textual entailment, and then define the scope of the recognition task.

We summarize previous work and point out two important issues involved, meaning representation and relation recognition. For the former, a general representation based on dependency relations between words or tokens is used to approximate the meaning of the text. For the latter, two categories of approaches, intrinsic and extrinsic ones, are proposed. The two parts of the thesis are dedicated to these two classes of approaches. Intrinsically, we develop specialized modules to deal with different types of entailment; and extrinsically, we explore the connection between RTE and other semantic relations between texts.

In the first part, an extensible architecture is presented to incorporate different specialized modules handling different types of entailment. We start with one specialized module for handling text pairs with temporal expressions. A separate time anchoring component is developed to recognize and normalize the temporal expressions contained in the texts. Then it is shown that the generalization of this module can handle texts containing other types of named-entities as well. The evaluation results confirm that precision-oriented specialized modules are required.

We also describe another module based on an external knowledge resource. A collection of textual inference rules is applied to the RTE task after being extended and refined with a handcrafted lexical resource. The evaluation results demonstrate that this is a precisionoriented approach, which can also be viewed as a specialized module. As alternative resources, we also present a pilot study on acquiring paraphrased fragment pairs in an unsupervised manner.

In the second part of the dissertation, a general framework is proposed to view textual entailment as one of the generalized Textual Semantic Relations (TSRs). Instead of tackling the RTE task in a standalone manner, we look at its connection to other semantic relations between two texts, e.g., paraphrase, contradiction, etc. The motivation of such a generalization is given as well as the framework of recognizing all these relations simultaneously.

The prerequisites of the TSR recognition task are data and knowledge resources. An overview of all the corpora used for the experiments is given and followed by a discussion of the methodologies used in their construction. Then we elaborate on two corpora we constructed: one has a new annotation scheme of six categories of textual semantic relations with manual annotations; and the other uses a crowd-sourcing technique to collect the data from the Web.

After that, textual relatedness recognition is introduced. Although relatedness is usually user- and situation-dependent, in practice, it can help with filtering out the noisy cases. It is linguistically-indicated and can be viewed as a weaker concept than semantic similarity. In the experiments, we show that an alignment model based on the predicateargument structures using relatedness as a measurement can help an RTE system to recognize the Unknown cases (i.e. neither Entailment nor Contradiction) at the first stage, and improve the overall performance in the three-way RTE task.

Finally the TSR classification is presented. A generalization of all the meaning representations described in the previous approaches is given. Then, a multi-dimensional classification approach is introduced, including relatedness as one of the dimensions. The other two are inconsistency and inequality. The approach is evaluated on various corpora and it is shown to be a generalized approach to entailment recognition, paraphrase identification, and other TSR recognition tasks. The system achieves the state-of-the-art performance for all these tasks.

As for the future work, we discuss several possible extensions of the current approaches. Some of the modules contained in the system have been already successfully applied to other natural language processing tasks. The promising results confirm the direction of research on this task and broaden the application area.