next up previous
Next: The Robust Parser Up: Language Analysis Previous: Language Analysis

Flat Utterance Descriptions

As previously noted, the system combines two different language processing architectures. Shallow processing is performed by the slot-filling Robust Parser described in Section gif below; deep processing by the SRI Core Language Engine (CLE; [Alshawi1992]). Linguistic output can be either propositional or non-propositional. Non-propositional output

consists of markers which are directly linked to dialogue moves; the most important examples are confirmations (``yes'', ``sure'', ``that's fine''), rejections (``no'', ``I'd rather not'') and topic shifts (``then...''). Propositional output consists of structured expressions which make reference to world objects like flights, trains, dates, times and costs.

The propositional representations produced by the Robust Parser are lists of slot--filler pairs; those produced by the CLE are expressions in a conservatively extended first-order logic. To allow the DM easily to compare the results produced by the two language processors, it is highly desirable that they be mapped into a common form: the challenge is to find a level of representation which represents an adequate compromise between them. With regard to the CLE, the important point is that most logical forms in practice consist of one or two existentially quantified conjunctions, wrapped up inside one of a small number of fixed quantificational patterns. By defining these patterns explicitly, we can ``flatten'' our logical forms into a format, which we call a Flat Utterance Description or FUD, that is compatible with a slot--filler list.

The different quantificational wrappers were suggested by our Wizard-of-Oz data; it proved meaningful to distinguish between four kinds of FUDs:

yn
Are there objects with property P?

wh
Find X with property P

wh_agg
Find the maximal/minimal X with property P

yn_agg
Does the maximal/minimal X with property P also have property P'?

The body of the FUD may contain items of three different kinds. Slot--filler items are of the form

slot(<frame name>, <slot name>, <filler value>).

This is to be interpreted as saying that the slot <slot name> of the predicate <frame name> is filled with the value <filler value>.

Constraint items are of the form

exec(goal)

and express numerical relations obtaining between slot-fillers and other values. Finally, referential items are of the form

ref(<filler value, ref info>)

and indicate that the object <filler value> is linguistically associated with referential information encoded as <ref info>.

For instance, the utterance ``I want to arrive in Stockholm before 6 pm'' is interpreted as ``Find flights arriving Stockholm before 6 pm'', and is represented by the following FUD:

wh(X, [slot(trip, trip_id, X),
       slot(trip, trip_mode, plane),
       slot(trip, to_city, stockholm)
       slot(trip, arr_time, T)
       exec(before(T,1800))])

The utterance ``Is that a direct flight?'' is represented by:

yn([slot(trip, trip_mode, plane),
    slot(trip, stops, 0),
    slot(trip, trip_id, X),
    ref(X, det(def,sing))])

where the ref expression represents the referential expression (``that'') in the utterance, and signals to the dialogue manager that a reference resolution has to be made.

Utterances like ``I want the first flight to Stockholm'' and ``Which is the cheapest ticket?'' translate into wh_agg expressions, while utterances like ``Is that the first flight?'' translate into yn_agg utterances. In our Wizard-of-Oz data, the vast majority of user utterances translate into wh FUDs (including some utterances that superficially are yes/no-questions, like ``Are there any flights to Stockholm on Monday morning?'').

When producing the FUD, the Robust Parser does a simple pass over the top hypothesis from the speech recognizer, in a manner described in the next section. In contrast, the CLE attempts to extract the ``best'' grammatical fragment from the lattice of words representing the top five hypotheses of the recognizer. Currently, the longest grammatical fragment is considered to be the best fragment, a strategy that can sometimes lead to trouble (see Section gif).

It is important to understand that the CLE may fail to translate its analyses into FUDs, because the user's utterance is not possible to capture using one of the FUD forms. In these cases, the CLE does not give any output at all. The Robust Parser, on the other hand, will always produce something; if the input is completely unintelligible it will at least give the minimal output wh(X,[]). This robustness is usually an advantage, but sometimes it can lead the system down the wrong path (see Section gif).



next up previous
Next: The Robust Parser Up: Language Analysis Previous: Language Analysis



Mats Wiren
Mon Oct 25 13:51:54 MET DST 1999