Gerdemann's Earley Style Processing Scheme

Next: Summary Up: Problems with Existing Approaches Previous: The Head-driven Approach of

Gerdemann's Earley Style Processing Scheme

In his thesis [Gerdemann1991] introduced algorithms for parsing and generation of constraint-based grammars, where both are specific instantiations of an Earley style approach. The Earley style parsing algorithm is very similar to the one presented in [Shieber1988] and [Shieber1989] and differs basically in improvements concerning the use of efficient application of subsumption checks and in being able to handle variable length lexical items (including traces). For the purpose of the current discussion we will therefore focus our attention on his adaption of the Earley style generator, before discussing its degree of uniformity.

The major aspect of Gerdemann's generation approach is the use of a head-driven Earley generation algorithm, i.e., instead of following a left-to-right selection rule, he adapts the head-first selection rule introduced in . Using this control regime also in the case of an Earley style approach he shows that he can achieve the same kind of goal-directness as but avoids the kind of nondeterminism and recursion loops, for which comes into trouble. Furthermore, he shows how to handle semantic non-monotonicity and how to use semantic information for indexing already derived structures, so that in his approach the generator can use an already generated phrase in more than one place (if necessary).

The first issue emphasized by Gerdemann is that the semantic monotonicity requirement is not necessary for an Earley type generation algorithm. As already discussed in paragraph 2.3.1, in order to be able to use the same generated phrases on different string positions, Shieber eliminates the string position indexing, by collapsing all of the item sets to one. However, as argued by Shieber himself ``The generation behaviour exhibited is therefore not goal-directed;'' ([Shieber1988], page 617). In order to achieve goal-directedness, he introduces the semantic monotonicity condition (already discussed in 2.3.1). But the goal directedness of Earley's algorithm does not come from the state sets, it comes from top-down prediction. Clearly, if a head-first selection rule is chosen (as known from ), a goal-directed behaviour can be obtained comparable to that of but without the requirement of semantic monotonicity. Of course, this implies that the restriction function used in the prediction step should ignore semantic information. Gerdemann concludes that ``If this non-goal directedness due to restriction is ignored, it is not at all clear why a generation algorithm that collapsed all states into one state set would lose the normal goal directedness that comes from prediction.'' ([Gerdemann1991], page 78).

In order to take full advantage of an Earley style strategy for generation, one has to face the problem of avoiding re-computation of phrases that can be placed at different string positions, i.e., to avoid backtracking in those cases. The basic idea followed by Gerdemann is to retain the state sets but to modify them so that whenever it happens that the generator starts to generate a duplicate phrase at a new string position, the chart will be readjusted so that it appears that the duplicate phrase is being generated in the same string position as the original phrase ([Gerdemann1991], page 80).

In order to make this modification, he introduced a global list (called GRD), which has to be maintained by the generator. This global structure consists of a list of all restrictors that have been used so far to make predictions with. Note that Gerdemann requires that restrictors should contain all (local) semantic information, so that this list also indicates which semantic information has been involved in making a prediction. Each entry on this global list is of the form [RD,F,C], where RD was used to make the prediction, F is the number of the state set in which the prediction has been made, and C is a list of all the complete phrases that have so far been generated from RD.

The list GRD is now used in the predictor and completor step as follows: If in the predictor an RD is created that is subsumed by an RD already in GRD, no new predictions are made, since any such predictions would be subsumed by predictions made earlier. If the subsuming RD was used in state F, then the current state will be moved to the Fth state set because any completions that are found for the previously made predictions will be linked back to the Fth state set. The effect of this moving operation is that it now appears that the duplicate phrase being predicted is actually predicted to start at the same string position. But now, by moving the current state back to the state set of a previously used RD, any completion for the previous state (recorded in C) can also be used as completion for the current state.

There are two problematic aspects of Gerdemann's approach with respect to the degree of uniformity. Firstly, it seems to be the case that a grammar has to be compiled into a specific parsing and generation grammar since in his implementation of the algorithm he followed a left-to-right scheduling (see Appendix C of [Gerdemann1991]). Secondly (and more important), the modifications necessary in order to maintain the global list GRD to avoid generation of duplicate phrases `` have fundamentally altered Earley's algorithm.'' ([Gerdemann1991], page 89). Since, the GRD list is only used during generation, the parsing and generation algorithms are substantially different. Both aspects together strongly weaken the uniform character of Gerdemann's approach.

Next: Summary Up: Problems with Existing Approaches Previous: The Head-driven Approach of

Guenter Neumann
Mon Oct 5 14:01:36 MET DST 1998