next up previous contents
Next: Problems with Existing Approaches Up: Current Approaches in Reversible Previous: Possible Arguments against Reversibility

A Classification Scheme for Reversible Systems

 

It is widely accepted that declarative knowledge bases are a fundamental prerequisite for achieving some degree of reversibility [Appelt1987]. From this point of view one can distinguish two general types of reversible natural language systems:

Up to now, systems that are capable of analysing and producing language fall into the first class, i.e., they use different operations for both directions (e.g., the systems HAM-ANS [Hoeppner et al. 1983], XTRA [Allgayer et al. 1989], CLE [Alshawi1992], LKP [Block1994], and the system DISCO [Uszkoreit et al. 1994]).

Currently, it is an open question what degree of reversibility should be and can be desired (cf. [Appelt1987], [Mann1987], [McDonald1987], [Shieber1988], [Joshi1987], and [Neumann1991a]). In some areas, however, reversible processing models have been developed that are based on formalisms which are well suited for uniform representation and processing, most notably Koskenniemi's two-level model of morphology [Koskenniemi1984]. In computational morphology it has become the state-of-the-art to use one and the same approach for performing morphological analysis and synthesis, using both the same knowledge and the same basic mechanisms.

In the last few years the investigation of constraint-based grammar theories have opened up the possibility of reversible grammars and even uniform grammatical processing.

What is a reversible grammar? From a linguistic point of view a reversible grammar is the specification of grammatical knowledge in such a way that the specification is neutral with respect to its use for analysis and synthesis of well-formed expressions. In some sense, almost all modern grammatical theories consider a language as a relation between surface strings and representations of their meanings in some logical language, which are mostly referred to as logical forms. Parsing is then viewed as a process that assigns to a given string its possible logical forms (defined by a grammar in use) and generation is thought of as finding all valid strings that have the same given logical form.

One of the strong points of current constraint-based grammar theories is their potential to describe this relationship in a strictly declarative way, so that linguists can abstract away from specific parsing and generation strategies.

However, among researchers who are concerned with the investigation of efficient parsing and generation strategies, there is currently an active debate whether parsing and generation can really be described by one and the same efficient process. The following classification schema gives an overview of the current research directions for parsing and generation of reversible grammars (the figure 2.1 is a graphical illustration of this scheme):

   figure177
Figure 2.1: The four different types of reversible grammar approaches which are discussed in this work.

Type A
Systems that use different grammars and different processes that are compiled from the same source.

Type B
Systems that use a common grammar, but different processes.

Type C
Systems that use different compiled grammars but a uniform process.

Type D
Systems that use a common grammar, as well as a uniform process.

In systems of type A the linguistic description for parsing and generation is specified only once but is compiled into two grammars for use in the run-time mode of the system: one grammar only for parsing and one only for generation. The advantage is that the source grammar can be more `tuned' for efficient processing during the compilation step. An example of this approach is described in [Block1991] where a Tomita-based LR parser is used for parsing and a variant of the semantic head-driven approach for generation. The main disadvantage of such an approach is that during run-time the same linguistic knowledge has to be stored twice which increases the amount of redundancy of the whole system.

In systems of type B the same grammar is also processed during the run-time mode by a specific parsing and generation program. The advantage of this approach is that the grammar can be changed and tested easily and that redundancy is not problematic in the case of linguistic information. An important disadvantage might be that they are too restricted in one of the directions if either the parser or generator is incomplete which in some cases can cause inconsistencies in the language behaviour of the system. An example of this approach can be found in the DISCO system [Uszkoreit et al. 1994], where during parsing an Earley style parser is used and during generation a variant of the semantic head-driven algorithm.

In systems of type C specialized grammars are compiled from the same source, but are then processed by the same underlying process. For example, such an approach is followed by [Dymetman et al. 1990]. Here, the basic processing regime of Prolog is used as the underlying uniform process (top-down, left-right). The grammar formalism used is lexically based, i.e., most information is contained in the lexicon (for example, subcategorization information of verbal elements). Because these kinds of grammars can cause left-recursion problems in the case of top-down processing, one goal of the compilation step is to transform left-recursive rules into equivalent non left-recursive ones. Furthermore, special ``guides'' are introduced during the compilation step to be able to specifically control parsing and generation. Clearly, the same disadvantage holds as for systems of type A, i.e., these systems bear a large amount of redundancy.

Systems of type D are following the most radical approach of reversibility: not only the same grammar is used but also one and the same interpreter is used for performing parsing and generation. The first detailed description of an implemented uniform architecture is due to [Shieber1988]. In his approach he follows the paradigm of linguistic processing as deduction introduced by [Pereira and Warren1983] in the case of parsing. Currently there are also approaches that view parsing and generation as type rewriting (cf. [Emele and Zajac1990], [Carpenter1992]). The advantage of a uniform approach is that inconsistency in the language behaviour can be avoided because either the restrictions are the same for both tasks or not. The main disadvantage which is often claimed, is that it is yet unclear whether and how these systems can be made equally efficient for both directions, and indeed the currently proposed approaches are problematic with respect to this issue.

There is a further important advantage of a uniform model, namely that they allow to tightly interleave parsing and generation, such that the one direction can directly re-use partial results from the other direction. In fact, in this thesis we present for the first time such an approach, which is also called item sharing method. Besides the different strategies the several approaches follow, there are commonly held assumptions about the status of a reversible grammar and the parser/generator:

In other words, the grammar as well as the parsing and generation processes are assumed to constitute the grammatical competence base of a natural language system, and -- more or less explicitly -- this competence base is assumed to have a modular status. Thus these approaches are consistent with important lines of research that are followed in the area of theoretical linguistics, cognitive psychology, and artificial intelligence.

When performance aspects are to be considered, additional mechanisms are necessary, for example, to realize some kind of ``best-first'' strategies, generation of paraphrases or possibly unambiguous utterances, or to obtain a kind of experience-based behaviour of the natural language system. If such mechanisms are to be built on the basis of the grammatical competence, then it is also an important criterion how these performance-oriented strategies can be combined or integrated with the grammatical competence base. The way in which competence and performance aspects can be combined is thus an important parameter for choosing between alternative approaches of grammar reversibility. In this thesis we present a new model of natural language processing that is based on a strictly uniform model of the grammatical competence base and demonstrate its relevance for language performance.

We first show how efficient uniform processing is possible, by presenting a new uniform tabular algorithm for parsing and generation of constraint-based grammars. We then show how this interpreter can be hooked up with additional mechanisms, that support a tight interleaving of parsing and generation, and allow a system to perform a specific kind of self-monitoring usable in such situations where a produced utterance carries the risk of being misunderstood.

In all of these cases, we present novel methods. By describing how these several methods and their combination benefit from the uniform processing of reversible grammars, we are able to demonstrate that a uniform model (even today) is not only theoretically elegant but also of practical relevance. In this sense, our thesis can be seen as a contribution to competence-based performance models for natural language processing.


next up previous contents
Next: Problems with Existing Approaches Up: Current Approaches in Reversible Previous: Possible Arguments against Reversibility

Guenter Neumann
Mon Oct 5 14:01:36 MET DST 1998