Università di Roma Tre
Università di Roma "La Sapienza''
Fiorella De Rosis
Università di Bari
In this paper we present an ongoing project on the simulation of a dialog between two synthetic agents. Communication between speaker and listener involves multimodal behaviors such as the choice of words, intonation and paralinguistic parameters as for the vocal ones; facial expressions, gaze, gesture and body movements as for the non-verbal ones. The choice of each individual behavior, their mutual interaction and synchronization produce the richness and subtlety of human communication.
During the last years research on autonomous agents was enriched with a specific research area: the area of talking faces. Computer graphics techniques made it possible to create moving bodies and heads able to perform human-like lip movements and facial expression. The simulation of muscle contraction and skin elasticity gives 3D human head models a high degree of realism. The link between facial expression and intonation (PEL94b), and between facial expression and dialog situation (CAS94) have been studied, multimodal systems have been produced (Thorisson97, Rickel&Johnson98, Churchill.et.al.98) that simulate face-to-face conversation, and facial expression have been categorized on the basis of their communicative meanings (TAKEUCHI93,Binsted98).
In this work we aim at the construction of a conversational agent with an animated face that exhibits gaze communication. The literature on gaze (ARG76,EKM79a,Eibl-Eibesfeldt74) has shown that eyes provide a number of informations about the interactants' social relationships, emotions, conversational moves and so on. In a model presented in previous works (Pezzato&Poggi98,Pelachaud&Poggi98) we have tried to find out a lexicon of gaze behavior, that is a list of ``gaze-items'' where it is possible to settle a correspondence between each specific gaze signal and the meaning attached to it.
We have also tried to establish an alphabet of gaze, that is, to find out the physiological states (humidity, reddening, pupil dilation or so) and the muscular movements of the eye region (eyebrow raising, frowning, eye opening, eye direction...) that are relevant in the production of gaze-items. For example, when one is referring to a very little thing s/he can squeeze eyes: so, in our lexicon of eyes, the signal part of this item can be described as: ``middle closed eyes with tense eyelids'', while the meaning part can be described as: ``small, little (even conceptually); it can be attributed to subtle objects or ideas''. Another case: a frown may described, from the signal side, as: ``inner parts of the eyebrows lowered and closer'', and from the meaning side as ``I am concentrating''. Take finally the case in which a Speaker emphasizes the comment of one's sentence by raising eyebrows: here the signal is: ``eyebrows raised, eyes wide open'', and the meaning is: ``I inform you that this is the focused part of my sentence''.
If eye communication may be viewed as a lexicon, for each of these eye lexical items we may provide a formal representation of the meaning and a formal representation of the signal. We view the meaning of each eye lexical item as a conjunction of cognitive units, that is of logical propositions representing the agent's goals Goals and Beliefs; as for the signal side, eyes muscular movements correspond to the Action Units (Aus) found by Ekman's FACS (EKM78a), while some physiological states like humidity or reddening may be implemented through computer graphics techniques. In this work we will present the analysis in terms of cognitive units of some meanings typically conveyed through eye communication. For instance, the act of pointing by gaze, some meta-cognitive information like ``I am thinking'' or ``I am trying to remember'', performatives like imploring, defiant or reproaching gaze and turn-taking devices like gazing to request speaking turn; finally, gaze that conveys social emotions, say, an adoring gaze or a scornful gaze, and non social emotions, like the eyebrow raising of surprise.
We categorize facial expressions and gaze in terms of their communicative functions rather than of their appearance, and we provide a cognitive representation of those semantic and communicative functions. This representation of the meaning of multimodal communication acts is the starting point to define a set of inference rules describing the``mental'' process ongoing in the speaker while communicating with the listener and the way that the listener will interpret the received communication. These rules take into account the Speaker's representation of the context at hand, including the power relation between Speaker and Listener and a model of the Listener's cognitive capacity and personality.
The final goal of this research is to build life-like characters with talking faces and capable of various forms of expressive and communicative behavior; to this aim, after representing multimodal communicative acts in terms of the mental states (de Rosis.et.al.99) that lead the speaker to perform them, we will formalize the reasoning process that leads a character with a given mental state and a given image of the interlocutor's mental state to perform a particular communicative act. We will formalize, as well, the way that the interlocutor will react to a given communication, by updating his/her own mental state, as a function of the way that communication is interpreted. This work thus sets up some building blocks of a theory about planned multimodal communication in systems of ``believable'' agents.
Argyle, M., and Cook, M. 1976. Gaze and Mutual gaze. Cambridge University Press.
Binsted, K. 1998. Designing portable characters.
In WECC'98, The First Workshop on Embodied ConversationalCharacters.
Cassell, J.; Pelachaud, C.; Badler, N.; Steedman, M.; Achorn, B.; Becket, T.;Douville, B.; Prevost, S.; and Stone, M. 1994. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. Computer Graphics Annual Conference Series, 413--420.
Churchill, E.; Prevost, S.; Bickmore, T.; Hodgson, P.; Sullivan, T.; and Cook.1998. Design issues for situated conversational characters. WECC'98, The First Workshop on Embodied ConversationalCharacters.
De Rosis, F.; Grasso, F.; Castelfranchi, C.; and Poggi, I. In press. Modeling conflict resolution dialogs. In Dieng, R., and Mueller, J., eds., Conflicts in AI. Kluwer Pub Co.
Eibl-Eibesfeldt, I.1974. Similarities and differences between cultures in expressive movements.
In Weitz, S., ed., Nonverbal Communication. Oxford University Press.
Ekman, P., and Friesen, W. 1978. Facial Action Coding System. Consulting Psychologists Press, Inc.
Ekman, P. 1979. About brows: Emotional and conversational signals. In von Cranach, M.; Foppa, K.; Lepenies, W.; and Ploog, D., eds., Human ethology: Claims and limits of a new discipline: contributions to the Colloquium. Cambridge, England; New-York: Cambridge University Press. 169--248.
Pelachaud, C., and Poggi, I. 1998. Talking faces that communicate by eyes. In Proceedings of the "Colloque Orage".
Pelachaud, C., and Prevost, S. 1994. Sight and sound: Generating facial expressions and spoken intonation from context. In Proceedings of the ESCA/AAAI/IEEE Workshop on Speech Synthesis.
Pezzato, N., and Poggi, I. 1998. The alphabet and the lexicon of eyes. In 6th International Pragmatics Conference.
Rickel, S., and Johnson. 1998. Task-oriented dialogs with animated agents in virtual reality. In WECC'98, The First Workshop on Embodied Conversational Characters.
Takeuchi, A., and Nagao, K.1993. Communicative facial displays as a new conversational modality. In ACM/IFIP INTERCHI'93.
Thorisson, K. 1997. Layered modular action control for
communicative humanoids. In Computer Animation'97. Geneva, Switzerland:
IEEE Computer Society Press.