Matthias Denecke and Alex Waibel

Integrating Knowledge Sources for the Specification of a Task-Oriented Dialogue System

[Full Text]
[send contribution]
[debate procedure]
[copyright]


Overview of interactions

No Comment(s) Answer(s) Continued discussion
1 18.2.2000 Jan Alexandersson


C1. Jan Alexandersson (18.2.00):

Dear Matthias,

many thanks for contributing to ETAI. I like the idea of using the same formalism for analysis as well as for dialogue management. I also think the partitioning of the rules into domain-dependent and domain-independent is good. However, I lack some understanding and information concerning the following points (aware of they are on the boarder of what you want to say, but nevertheless...):

The overall system status
Can you tell us about the status of the system? Has is been evaluated? How? The example dialogue is in German. Is it a German system or is there an English version too?

System architecture
In the example dialogue there is a hint of a map included in the system. The role of this is not clear at all to me. Can you use the map for input, or is it just an output channel for the system? Effects on generation? Can you elaborate a bit on this? Moreover, you show the system architecture in figure 7 -- is this THE system architecture? Maybe you should us a more appropriate term. Or add some interconnections.

Section 3.2, page 6
You mention Grosz&Sidner-86 (who doesn't :-)). A consequence for this is might be, if I understand it correctly, that you have no access to prior context, that is, c.f., a hotel reservation. So, if I successfully have reserved a room in a hotel, and I ask the system to reserve a table in the near for the day of arrival, are your system able to figure out when/where this is, or does it behave as in your example dialogue (again asking for location, time, etc.)?

Generation
How are the system contribution constructed? Template generation? Or do you use something more sophisticated?

Speech acts
You mention ``speech acts'' and explain what they are used for. But not which ones. And why these? Have you done a corpus analysis (do you have a corpus?)? How do you recognize them? With keywords, sentence mood, context,...? This has to do with the ``overall system status'' too, since I'm interested in how well you can recognize these.

Miscellaneous
On page 4 you introduce the term ``goal stack'', but without explaining or pointing at the explanation of its function.

Footnote 1: ``n2eed'' Typo?

mit den besten Gruessen,
Jan Alexandersson

A1. Matthias Denecke (27.1.00):

Dear Jan,
thank you for your questions.

Jan Alexandersson:

The overall system status
Can you tell us about the status of the system? Has is been evaluated? How? The example dialogue is in German. Is it a German system or is there an English version too?

The authors reply:

The status is work in progress. There has been some small evaluation some time ago, but we needed to recode major parts of the system due to project obligations (moving from C++ to Java Interface). This set us back some time.
There is also an English version of the system. Incidentally, the English and German versions share many of the representations. In particular, the representations of the goals, the type hierarchy, the database structures and the rules are either identical or very similar (there are some slight variations in the tasks). Moreover, we tried to structure the semantic grammars for German and English in a similar fashion. There are conversion rules that convert a syntactic/semantic parse tree into a typed feature structure, and these rules are supposed to be the filter that variances in the different language cannot pass. The goal of this is to encapsulate the language used from the dialogue processing.

Jan Alexandersson:

System architecture
In the example dialogue there is a hint of a map included in the system. The role of this is not clear at all to me. Can you use the map for input, or is it just an output channel for the system? Effects on generation? Can you elaborate a bit on this?

The authors reply:

I think I did not make this very clear in the presentation of the system. There is a map server that contains the database of a town (Karlsruhe or Pittsburgh). It can display parts of the map, zoom in and out calculate and display shortest paths etc. Its function is to one part that of a database and to the other part that of a graphical display of what the user wanted the system to do. So, if I ask for the path to some restaurant, the system determines (through DB access) the referent of the NP I used and passes the address on to the map server. This is now interpreted (somewhat loosely) as a database request which returns one object, namely the shortest path. This again is converted to a TFS and integrated in the discourse. You can also use the map as an input channel, but not with too much variability as of now. For example, if you have the choice between m places, and the system comes back with a clarification question, you may circle an area containing n < m places. Places that are not in the circle are removed from the underspecified TFS and the system tries to disambiguate further. At one point, we had a CMU campus information demo. It featured a little bit more elaborate path description in that it iterated over the path segments, generated an instruction (such as turn left) if appropriate, and mentioned buildings if the current path segment went along a building. This was implemented by enhancing the rules by a mechanism similar to the PROLOG fail predicate in order to achieve the loops.

Jan Alexandersson:

Moreover, you show the system architecture in figure 7 -- is this THE system architecture? Maybe you should us a more appropriate term. Or add some interconnections.

The authors reply:

The purpose of figure 7 was to show the interactions between the objects within the dialogue manager. Basically, the dialogue manager contains an ensemble of abstract data types (stacks, lists and trees) that can be controlled by the rules. So the disscourse structure is a tree, the goal stack is a stack ;-) and so on. From an engineering point of view, the system at that time was a client/server architecture, in which predicates (parts of the rules) could either be evaluated within the dialogue manager (such as subsumes(tfs1,tfs2)) or passed on to a server (such as the map server). You are right, the subtitle is unclear. Thanks for the comment.

Jan Alexandersson:

Section 3.2, page 6 You mention Grosz&Sidner-86 (who doesn't :-)). A consequence for this is might be, if I understand it correctly, that you have no access to prior context, that is, c.f., a hotel reservation. So, if I successfully have reserved a room in a hotel, and I ask the system to reserve a table in the near for the day of arrival, are your system able to figure out when/where this is, or does it behave as in your example dialogue (again asking for location, time, etc.)?

The authors reply:

While you are correct in that the system cannot handle two subsequent dialogues and establish reference between them, I don't think it's a consequence from the design decision to employ hierarchical discourse structure or similar. Rather, the structure is an additional knowledge source that I can take advantage of to constrain solutions. As I said above, this is implemented as data structures. And technically, it's possible to access objects in data structures that belong to the previous dialogue.

Jan Alexandersson:

Generation
How are the system contribution constructed? Template generation? Or do you use something more sophisticated?

The authors reply:

It's actually a mixture between template generation and a conversion from TFS to text. In order to disambiguate, an underspecified TFS is generated (one of the kind that I show in the paper). In the second step, the semantic content of the clarification question is generated based on the grounds which information would disambiguate most efficiently on average. This is a list of TFS. Then, conversion rules similar to those that convert parse output to TFS are used to convert each TFS from the list to a string, all of which are then pressed into a template. I agree that something more sophisticated is highly desirable, in particular to avoid the redundancy of specifying conversion rules for two directions.

Jan Alexandersson:

Speech acts
You mention ``speech acts'' and explain what they are used for. But not which ones. And why > these? Have you done a corpus analysis (do you have a corpus?)? How do you recognize them? With keywords, sentence mood, context,...? This has to do with the ``overall system status'' too, since I'm interested in how well you can recognize these.

The authors reply:

We do have a corpus, we did not (as of yet) analyze it w.r.t. speech acts. The speech acts basically differentiate the forms of possible discourse update. As such they are most often determined by the parse output, sometimes by the context. If the semantic reprentation of a new utterance does not make the (dependent) representations of a question more specific, it is considered not to be its answer (simply put).

Jan Alexandersson:

Miscellaneous
On page 4 you introduce the term ``goal stack'', but without explaining or pointing at the explanation of its function.

The authors reply:

The stack is simply used to remember goals if the user decides to enter a subdialogue. You can have something like:

Level Utterance Goal Stack Comment
1 System: do you want A or B? [GOAL X] [Goal X] pushed
2 User: which one is closer? [Goal X][Goal closest(Y)] [Goal closest(Y)] pushed
2 System: A is closer [Goal X] [Goal closest(Y)] is satisfied and popped
2 User: which one is cheaper? [Goal X][Goal cheaper(Z)] [Goal cheaper(Z)] pushed

and so on.

Jan Alexandersson:

Footnote 1: ``n2eed'' Typo?

The authors reply:

Yes. Thanks.

I hope this answers your questions.
Thank you.

Best regards,
Matthias


Additional questions and answers will be added here.
To contribute, please click [send contribution] above and send your question or comment as an E-mail message.
For additional details, please click [debate procedure] above.
This debate is moderated by the guest editors.