Maria Aretoulaki and Bernd Ludwig

Automaton-Descriptions and Theorem-Proving: A Marriage made in Heaven?
[Full Text]
[send contribution]
[debate procedure]
[copyright]

Overview of interactions

No	Comment(s)	Answer(s)	Continued discussion
1	1.2.00 Joris Hulstijn
2	22.2.00 Ingrid Zukerman	25.2.00 Aretoulaki, Ludwig

C1. Joris Hulstijn (1.2.00):

I have some comments and questions regarding the contribution by Aretoulaki and Ludwig.

Let me start by saying that I find this a brilliant contribution. It is really good to see formal verification work being combined with general theories of dialogue. Using DL for the knowledge representation is a good choice. My comments have mainly to do with the apparent lack of space the authors had to explain their ideas.

section 2 `out of domain', `out of vocabulary'
This point is mentioned explicitly in the introduction to motivate a flexible approach to dialogue management. Maybe you can come back to this point at the end of the paper and show how the various knowledge sources you defined allow you to deal with (some) `out of domain problems' in a sensible way?

section 3.2 `coherence'
This is an interesting section, but too short for me to follow exactly what is happening. Coherence is based on pragmatic effects (which?) In particular, what is the algorithm by which the ``domain model guides the salience of the various topics and subtopics dealt with''?

What is the relation of coherence with the action-reaction pairs? You could think of action-reaction (initiative-response) pairs as grammar rules that define a conversational game (cf section 4.3). Coherence is a constraint on the well-formedness of the conversational game, just like e.g. subject-verb agreement in normal grammars defines the well-definedness of sentences.

section 4
... uses these types of knowledge effectively and efficiently ... It sure looks good, but at the moment of publishing the DM was still under development :)

section 4.2 `dialogue state'
There are a lot of resources that start with the word dialogue. What exactly is the dialogue state? Is it a node in a FSA like the one shown in figure2? How does it differ from the dialogue history? (which is not stored at DM, but UM stangely; why?) How do the FSAs relate to the action-reaction patterns?

UM ... `general domain knowledge'
If it is general it must be static. If it is dynamic, it is linked to a particular user and no longer general. This actual user's domain knowledge may differ from the expected domain knowledge. Wow! Do you have an example of a misunderstanding that is detected this way?

WM
Background knowledge is the `wastebin' of open problems in pragmatics. I can see that DL can help with domain-dependent background knowledge, such as time and location. But for a `generic model of communicative behaviour' you need more than a knowledge representation language. This brings up a lot of questions.

-- Principes like sincerity or cooperativity are assumed. What if someone turns out to be lying, or simply misunderstood? (i.e. how do you apply these principles? default logic?)
-- How complete is this generic module? Is it ever complete?
-- Can you evaluate it?

This seems to me a better place to discuss the coherence principles that you mention under 3.2.

section 4.3 grounding
Many speech acts are not grounded explicitly (by an acknowledgement) but implicitly by a coherent continuation of the topic. This is again related to what you mention under 3.2 about coherence. It is also related to the technique of implicit verification pronpts, used by many current SDSs. It seems that you have a theory which can make the design decision whether to use explicit or implicit verification in a more principled manner than most designers! Moreover, you could run simulations for both strategies under different (faked) speech recognition rates / condfidence measures to assess the best strategy for different conditions.

irg game discussion
Ok, now I understand it better. However, it would be nice if you could show how non-task related coherence aspects are dealt with in the logic. In general, the way you use the notion of coherence needs to be explained better.

section 6
In case you want to/have to mention related work, consider work by David Sadek on ARTIMIS, which is as far as I know the only successful implementation of a modal logic (goals, beliefs etc) in an SDS. Also work by Asher & Lascarides (1998;1999) is very similar in spirit. It uses coherence relations (cf rethorical relations) but then based on initiative-response pairs (exchanges). These papers can be obtained from Lascarides's homepage. http://www.cogsci.ed.ac.uk/~alex/papers.html

References

Alex Lascarides and Nicholas Asher (1999) Cognitive States, Discourse Structure and the Content of Dialogue, in: Preproceedings of Workshop on Semantics and Pragmatics of Dialogue (Amstellogue'99)}, University of Amsterdam
Nicholas Asher and Alex Lascarides (1998) Questions in Dialogue in:Linguistics and Philosophy {21}, 237-309
Philippe Bretier and David Sadek (1997) A Rational Agent as the Kernel of a Cooperative Spoken Dialogue System: Implementing a Logical Theory of Interaction}, in: {Intelligent Agents III: ECAI'96 Workshop on Agent Theories, Architectures and Languages (ATAL)}, LNCS 1193

Joris Hulstijn, University of Twente

A1. Aretoulaki, Ludwig (25.2.00):

Dear Joris,

Thank you very much for reading our article in detail and taking the time to comment on it. We appreciate your praise of the approach, but also give you full right about the need for clarifications: we did not take the liberty to expand on the various issues, as we were unsure about the required / acceptable length of the contributions.

To your specific comments:

section 2 `out of domain', `out of vocabulary' This point is mentioned explicitly in the introduction to motivate a flexible approach to dialogue management. Maybe you can come back to this point at the end of the paper and show how the various knowledge sources you defined allow you to deal with (some) `out of domain problems' in a sensible way?

The authors reply:

You are right: we have failed to take up this issue later on in the paper. Flexibility in the case of `out of domain'/ `out of vocabulary' would mean the ability to interrupt the "normal" course of the dialogue in order to address the "extraordinary" occurence of a relevant word or phrase that is not covered by the lexicon or domain model of the system. The system has to let the user know what the problem is, irrespective of what its or their current goals have been. Now, the existence of a User Model and a Dialogue History means that the system will be

better prepared for future out of domain or out of vocabulary words and phrases (with a higher probability for a similar phenomenon occurring established during speech recognition) and

more careful and conservative in its interaction strategies (e.g. more confirmation requests than usual), thereby avoiding the fulfilment of the wrong task, associated with misunderstandings (in turn, associated with erroneous speech recognition).

The interaction history and the model of the specific user have adapted the behaviour of the system for the specific situation, something that needn't happen otherwise. This is what is meant with flexibility at any rate.

Joris Hulstijn:

section 3.2 `coherence' This is an interesting section, but too short for me to follow exactly what is happening. Coherence is based on pragmatic effects (which?) In particular, what is the algorithm by which the ``domain model guides the salience of the various topics and subtopics dealt with''?

The authors reply:

This short section was meant to be a short introduction to the approach detailed in the subsequent sections. This is also why it's full of keywords and claims, without examples and explanations. The reader is already referred to the subsections of Section 4. For example, the algorithm you mention is illustrated in Section 4.4.

Joris Hulstijn:

The authors reply:

Exactly. And these constraints are formulated in respect to domain concepts, as well, whose definition / specification is necessary for the fulfilment of the user and system goals (e.g. information quest, information acquisition).

Joris Hulstijn:

section 4
... uses these types of knowledge effectively and efficiently ... It sure looks good, but at the moment of publishing the DM was still under development :)

The authors reply:

You're right. And it still is under development. Efficiency and effectiveness refer mainly to the principles of processing and not an evaluation of the final actual performance of the system.

Joris Hulstijn:

section 4.2 `dialogue state' There are a lot of resources that start with the word dialogue.

The authors reply:

Only the Dialogue Module (of the Dialogue Manager or Controller if you like) does! Dialogue States are information units processed by the Dialogue Module.

Joris Hulstijn:

What exactly is the dialogue state? Is it a node in a FSA like the one shown in figure2?

The authors reply:

Yes. Information about ordering between states is inherent. e.g. answer(x) should follow and not precede query(x).

Joris Hulstijn:

How does it differ from the dialogue history? How do the FSAs relate to the action-reaction patterns?

The authors reply:

The Dialogue History is a representation of not just individual dialogue states (recording the speech acts expressed in specific utterances) but of all established (recognised) dialogue states that were involved in the specific interaction. Thus, it's an integrated representation of action-reaction patterns with specified (rightly or erroneously understood) content, which is in terms of domain concepts and relations.

Joris Hulstijn:

dialogue history (which is not stored at DM, but UM stangely; why?)

The authors reply:

The reason behind this choice was that we wanted to keep DM generic with universal action-reaction pairs that are not bound to specific domain models. There is of course the Task memory also in this module which is supposed to keep a record of task requirements in terms of domain concepts so that new but related tasks can be integrated without repetition of information. So, in a sense the Task Memory is a summary of the Dialogue History which is found in the User Model. Thus, the UM is user-specific despite the fact that at the start of an interaction it contains default information (see next point).

Joris Hulstijn:

UM ... `general domain knowledge' If it is general it must be static. If it is dynamic, it is linked to a particular user and no longer general. This actual user's domain knowledge may differ from the expected domain knowledge.

The authors reply:

You are right about both! There is the general and static pieces of information (that flights have got e.g. arrival times associated with them), but also the user-specific dynamic part of this module which records on-line what the current user knows or wants (e.g. that there are no flights arriving after midnight at the desired destination). When static and dynamically-acquired information differ with one another, the dynamic overrules the static one for the duration of the specific interaction. This doesn't mean however that the static information is deleted.

Joris Hulstijn:

Wow! Do you have an example of a misunderstanding that is detected this way?

The authors reply:

For example, the database contains entries for flights that arrive after midnight in London (static information available to the corresponding part of the module with the defaults). When the user asks for a late flight to London, they could be referring indirectly or directly to 20:00-23:00 flights. The system however could point out the existence of midnight flights first, as it is special information that could be of interest to the user, especially if they did not know about this possibility.

Joris Hulstijn:

WM
Background knowledge is the `wastebin' of open problems in pragmatics.

The authors reply:

We could not help the use of this hold-all (and nothing!) term! WM is a support module for the DM. It is used like a "disambiguation" component, which has no functionality however without the other modules. (It can also not be activated without their previous activation).

Joris Hulstijn:

-- Principes like sincerity or cooperativity are assumed. What if someone turns out to be lying, or simply misunderstood? (i.e. how do you apply these principles? default logic?)

The authors reply:

We do not deal with applications where the user could be lying. So truthfulness is the default. Cooperativity (on the part of the user at least) is also assumed. But of course, cooperativity cannot make up for misrecognition errors, you are right. In this case, it is the DM and UM that come into play, with their knowledge about the action - reaction patterns (e.g. confirm(x) -> reject(?) -> query(x)) and the behaviour of the specific user in the course of the interaction.

Joris Hulstijn:

-- How complete is this generic module? Is it ever complete?

The authors reply:

WM follows the knowledge contained in the DM and UM modules closely, so in this sense it will probably never be complete, exactly like no word recogniser can ever be developed to recognise every word that can possibly be employed by the user.

Joris Hulstijn:

-- Can you evaluate it?

The authors reply:

It can be evaluated in conjunction with DM and UM. WM is called to deal with "underspecified representations" (e.g. reference ambiguities). When no new information or knowledge is offered by WM to the other modules, then it has failed to operate. A systematic evaluation should hopefully be carried out in the next 2 months.

Joris Hulstijn:

This seems to me a better place to discuss the coherence principles that you mention under 3.2.

The authors reply:

You are right. Although, as said before, in 3.2 the concepts are just introduced as relevant to this work and then in 4.3 and 4.4 are shown how they are implemented in practice.

Joris Hulstijn:

section 4.3 grounding Many speech acts are not grounded explicitly (by an acknowledgement) but implicitly by a coherent continuation of the topic. This is again related to what you mention under 3.2 about coherence. It is also related to the technique of implicit verification pronpts, used by many current SDSs. It seems that you have a theory which can make the design decision whether to use explicit or implicit verification in a more principled manner than most designers!

The authors reply:

It has to do with the confidence of the speech recognition and the history of misunderstandings (or rejections on the part of the user). Implicit confirmation is the default, whereas explicit confirmation is only used as a repair strategy.

Joris Hulstijn:

Moreover, you could run simulations for both strategies under different (faked) speech recognition rates / condfidence measures to assess the best strategy for different conditions.

The authors reply:

Yes, that would be very interesting. I think something along these lines has already been done by M. Danieli and E. Gerbino ({Metrics for Evaluating Dialogue Strategies in a Spoken Language System, 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation}), where it was shown that there is a tradeoff to be gained between a large number of turns (due to explicit confirmations) and the avoidance of misunderstandings (that would otherwise only be spotted late in the dialogue).

Joris Hulstijn:

irg game discussion Ok, now I understand it better. However, it would be nice if you could show how non-task related coherence aspects are dealt with in the logic. In general, the way you use the notion of coherence needs to be explained better.

The authors reply:

Thanks. We will try to mend this in the new version of the article.

Thanks for the references, we will try to explicitly refer to them in the article.

Best regards,
Maria and Bernd

C2. Ingrid Zukerman (22.2.00):

I have several questions and comments about this paper. Before I start, I'd like to point out that I mainly do generation and plan recognition, and only now I am getting into dialogue. My main difficulties with the paper are:

It was not clear to me how FSAs and dynamic inferences were actually combined. There is some hint that the FSAs have a level of abstraction that is higher than normally used for dialogue processing, but in the absence of a worked example, I couldn't really see how this goes.
It was not clear how the proposed mechanism solves problems that have remained unaddressed by former mechanisms. I think this should be clearly illustrated in a worked example. In particular, the example in the paper is handled by the RADAR inference-based system described in the following publications.
Raskutti, B. and Zukerman, I. (1991), Generation and Selection of Likely Interpretations during Plan Recognition. User Modeling and User Adapted Interaction 1(4), 323-353, Kluwer Academic Publishers.
Raskutti, B. and Zukerman, I. (1997), Generating Queries and Replies during Information-seeking Interactions. International Journal of Human Computer Studies 47(6), 689-734.
I think it would help if you introduce an example where the traditional process does not work.

Additional (more detailed comments):
In the Introduction you state that ``FSAs are used for the domain-independent modeling for discourse unit sequences, and inferencing is used to model specific domain knowledge''. What about the work of Carberry et al. (in particular Carberry and Lambert), where domain independent aspects of dialogue are incorporated in an inference-based framework?

You later focus on spoken-language dialogue systems. I'd like to see some comments about the relation between such systems, which at present support database lookups, and more complex dialogue systems, where there is some negotiation between the user and the system.

You say that ``coherence between utterances is determined on the basis of the coherence between respective pragmatic effects''. I don't really understand what that means. Can you please explain? I also didn't quite understand how Description Logics (DL) are used, and why it is important to use DLs compared to other formalisms.

Also, I wasn't quite clear about how the UM interacts with the Task and Dialogue modules (see first paragraph in page 7). Can you please elaborate a bit more? Likewise, I wasn't clear about how the preliminary UDAs.0 are interpreted using world knowledge to result in the actual or intended UDAs.1. I didn't understand why in general the User Model should be consulted first and then the World Model? Some aspects are clear, e.g., the finer interpretation of time-related information, but others are not so clear. I think it might be nice to have a worked example throughout the paper, even if not all of it is implemented, just to give people a clearer idea of how things would work.

In page 9 you say that you don't want to use speech acts that are overloaded with pragmatic information because their automatic identification is not straightforward. Why is this so? I also didn't understand why a conversational game is selected for the current utterance. What does that mean? Is a different game selected for each utterance?

Finally, you talk about recording certain behavioural traits of the user, e.g., degree of cooperativity. How would this affect the responses of the system? Ingrid Zukerman, University of Monash

Additional questions and answers will be added here.
To contribute, please click [send contribution] above and send your question or comment as an E-mail message.
For additional details, please click [debate procedure] above.
This debate is moderated by the guest editors.