Elisabeth André, Mathias Bauer, Dietmar Dengler
Markus Meyer, Jochen Müller, Susanne van Mulken,
Gabriele Paul, Thomas Rist, Wolfgang Wahlster
During the last few years, huge investments in the telecommunication infrastructure as well as the fast growing community of World-Wide-Web users has almost turned the slogan ``information on everything, anywhere and anytime'' into reality. The WWW as the world's largest public ``knowledge base'' already plays a considerable role in many decision making processes related to both professional and private activities in everyday life. News brokers, electronic bourses, postings of technical innovations and latest scientific results as well as bulletin boards and on-line discussion groups are among the arduous contributers to the exponential growth of this global information resource.
Recently, more and more webpages are no longer just a collection of
media objects, but allow interaction with the user. The user can buy,
bid, vote, subscribe and register electronically. Before a user
triggers such an online transaction a decision has to be taken,
whether and how a particular transaction is compatible with her current
beliefs, desires and intentions. In many cases, the information
relevant to a particular decision will be conflicting as well as the
decision criteria, so that only a careful analysis of
the pros and cons will lead to a rational decision.
However, the overwhelming information quantity provided by the WWW does not per se guarantee better decision making. Rather, there are two issues to be addressed:
The MIAU project Multiple Internet Agents for User-adaptive Decision Support) addresses both issues as it aims at the development of user-adaptive information mining tools and computer-based presentation techniques to effectively support human decision makers.
The MIAU project is the successor of the two projects PAN (Planning Assistant for the Net) and AiA (Adaptive Communication Assistant for Effective Infobahn Access). It integrates these two lines of research and guarantees the tight coupling between information gathering and information presentation that is necessary for effective user-adaptive decision support.
Fig. 1 illustrates a typical application scenario for MIAU. Imagine two users sitting in front of a PC and searching for a car on the web.
Based on the individual user preferences regarding cars, the system tries to identify online documents discussing various aspects of this topic (e.g. environmental impact and safety). It then extracts the relevant pieces of information and weights them according to the individual importance factors, thus forming the basis for decision making.
The weighted propositions are then allocated to a character team for presentation. Note that the presentation is not just a mere enumeration of the plain facts about the car. Rather, the facts are presented along with an evaluation under consideration of the users' interest profile. With regard to collaborative browsing, the use of multiple presenters also allows for performances that account to a certain extent for the different interest profiles of a diverse audience.
In order to achieve these ambitious application-oriented goals of the MIAU project, the following fundamental questions have to be addressed by basic AI research within this project:
The project MIAU will integrate four types of work:
Figure 2 locates existing information systems using three classification dimensions and spans the space to be explored by the MIAU project. The semantics of the various dimensions is as follows.
A number of application scenarios falling within the exploration space outlined above (see Figure 2) will be investigated during the MIAU project. Depending on the actual instantiations of the various system components assessed in the dimensions , , and , a variety of systems can be produced.
Attentive, Task-oriented Information Systems
Given a formal description of a task (e.g. a workflow, a plan, or a schedule listing a number of meetings to be held) and a user trying to accomplish it, a task-oriented information system (TIS) should anticipate the user's information requirements. It activates a number of information agents that exploit knowledge about the current state of the task under consideration, the user's interests, expertise, and possibly also her role within a team of persons working on the same task. The objective is to provide information that is
Consider the task of writing this project proposal where the various sections can be modeled as subtasks. While writing the introductory chapter, information agents could already proactively try and find documents regarding the state of the art in the various relevant subfields. Organizing them in clusters with respect to these categories and sorting them in chronological order, the user can be presented a good starting point when writing this section. Personal preferences might come into play when several persons share the task of writing the proposal. Knowing that user EA currently is a member of the project AiA, while MB works in the PAN project, the system could split the documents related to ``previous work of the project team'' (see Section 4.2) in two sets and forward them to the respectively most competent participant.
Multi-aspect Decision Support Systems
In this class of systems the emphasis lies on inferring the user's (domain-dependent) preferences, translate them into information requests, identify relevant pieces of information, and present them to the user.
An example scenario is depicted in Figure 1. The user (potential customer) is interested in several aspects of cars, each of which influences her decision making to some degree. The system's task is
MIAU will explore new styles of presentation by making use of multiple interface agents. The basic idea is to communicate information by means of animated dialogues which can be observed by an audience. In fact, teams of interface agents can contribute to the success of a presentation with regard to the following aspects:
To accomplish these tasks, we will investigate the following two approaches:
In this approach, the system appears in the double role of a screenwriter and director which generates a script for the actors of a play. The script specifies the dialogue acts to be carried out as well as their temporal coordination. From a technical point of view, this approach may be realized by a central planning component which decomposes a complex presentation goal into elementary dialogue acts which are then allocated to the single agents. This approach seems less appropriate if we allow humans to participate in the conversation.
In this approach, the single agents will be assigned a set of communicative goals, but have to determine themselves how to realize them. Since the agents have only limited knowledge concerning what other agents may do or say next, this approach puts much higher demands on the agents' reactive capabilities. On the other, it seems more appropriate for settings with human conversational partners. From a technical point of view, this approach may be realized by assigning each agent its own reactive planner.
Depending on their role and personality, characters may pursue completely different goals. For instance, a customer in a sales situation usually tries to get information on a certain product in order to make a decision while the seller aims at presenting this product in a positive light. To generate believable dialogues, we have to ensure that the assigned dialogue contributions do not conflict with the character's goal. Characters differ not only with respect to their communicative goals, but also with respect to their communicative behavior. Depending on their personality and emotions, they may apply very different dialogue strategies. For instance, in contrast to an extravert agent, an introvert agent will less likely take the initiative in a dialogue and exhibit a more passive behavior. Finally, what an agent is able to say depends on its area of expertise.
Even if the agents have to strictly follow a script as in the script-based approach, there is still enough room for improvisation at performance time. In particular, a script leaves open how to render the dialogue contributions to make. Here, we have to consider both the contents and the communicative function of an utterance. For instance, utterances would be rendered differently depending on whether they are statements or warnings. To come across as a believable, agents with a different personality should not only differ in their high-level dialogue behaviors, but also perform elementary dialogue acts in a character-specific way. MIAU's model of conversation will be based on work done in cognitive science. According to empirical studies, extravert characters use more direct and powerful phrases than introvert characters (Furnham 1990) speak louder and faster (Scherer 1979) and use more expansive gestures (Gallaher 1992). Furthermore, the rendering of dialogue acts depends on an agent's emotional state. Effective means of conveying a character's emotions include acoustic realization (Collier 1985) and facial expressions (Ekman 1993).
The MIAU presentation system will be designed as a testbed that allows for experiments with various personalities and roles. In everyday work situations, it is often important not only to recall what was said, but also who said it and when (source monitoring, cf. [SJHM97]). In MIAU, we will investigate to what extent presentation teams with perceptually easily distinguishable agents foster this kind of source montinoring (see also [CGG+99], Exp. 2). Furthermore, we would like to address questions, such as the best mapping between the user's personality and the agents' personality. In fact, the question on whether complementary or simularity-attraction holds has been controversal so far (see [IN98]).
The field of (Web-based) information systems has recently seen an enormous upswing. While systems like Information Manifold [LRO96a] and Ariadne [KMA+98] focus on the exploitation of distributed, semi-structured information sources, the personalization of the information retrieval or filtering process is another aspect of growing interest that is dealt with, e.g. in the Letizia [Lie95] and Syskill & Webert systems [PMB96]. Here information regarding the user's reading interests is exploited to suggest potentially relevant hyperlinks to be followed or identify interesting news articles, respectively.
While most user profiles in current information systems represent reading interests in terms of weighted keyword vectors containing typical words occurring in seemingly relevant documents, the OySTER system suggested in [Mül99] makes use of a logical theory describing the user's current context in order to enhance queries by situation-dependent information. The result is an off-line meta search engine that is intended to deliver high-precision results at the cost of speed. [Gök99] represents another attempt to link a user's reading and searching behavior to her real-world situation, e.g. her current task or domain goal. However, the context learner described does not try to find an explicit representation of the user's intentions, but rather aims at identifying common patterns in subsequent queries to an information system (or the Web) which are then used to better estimate the relevance of documents found.
Similarly, Watson [BH99] tries to identify the user's current context in order to recommend related information in a ``just-in-time'' manner. Using a number of heuristics seemingly relevant keywords are extracted from the document the user is currently working with (e.g. an HTML document in a Web browser or an article to be written in MS Word) and used to form context-specific queries which are then forwarded to conventional search engines. The result is clustered according to various similarity measures and presented to the user in an unobtrusive way. Although the use of task models is explicitly mentioned, Watson exclusively deals with text processing and browsing actions. The context, and thus, all the knowledge about the user's potential interest, is represented in terms of keywords that can immediately be used to form Web queries.
The discrepancy between the user's and system's view of the world that was already discussed in [FS91]--the InfoScope system presented there offers an adaptive graphical interface to Usenet news--has long been neglected. Only recently has the problem of producing comprehensible (information) agent behavior been tackled which is an indispensable prerequisite when personalizing a process like information seeking where user acceptance is critically influenced by the system's ability to explain or justify its actions and the degree of control the user can exert. [BP99] introduces an adaptive news agent that is able to explain its suggestions to the user--albeit in terms of the internal, keyword-based representation of the user's perceived interests only. The News Dude filters incoming news articles using a hybrid user model representing both long-term and short-term reading history. Depending on which model component was relevant for accepting or rejecting a particular article, a different classification algorithm is used (either an instance-based or a probabilistic classification mechanism) and either the (dis)similarity to other documents or the occurrence of certain keywords is verbalized in order to justify the systems decision.
``Translucency'' of the user model also plays a crucial role in the abovementioned OySTER system [Mül99]. Here inductive logic programming is used to construct a representation of the user's current fields of interest in terms of concepts of a domain-dependent ontology. A visualization of this ontology and the concepts forming the user model serves as the interface used to directly manipulate its contents.
The idea of inspectable or ``scrutable'' user models was already mentioned in [Kay94]. The um toolkit for user modeling provided a graphical interface to the user model making its contents both inspectable and manipulable to the user. To this end, however, the user has to learn the system's language and representation formalism in order to effectively influence the model contents.
Sharing a common language is one of the prerequisites for collaboration not only among various (information) agents, but also between an agent and its user. The COLLAGEN architecture described in [RS98] presents a framework for mixed-initiative problem solving exploiting the capacities of both an agent and its user (the work presented there intersects with the TrIAs approach introduced in [BD99b]). Both the agent and the user can take turns in suggesting the next steps to be carried out to complete a task like planning a journey. Communication is extremely simplified as the user can only choose among a small number of alternatives suggested by the agent in dependence of the current state of the problem solving process.
Most information systems answer to a query--either explicitly entered by the user or implicitly determined from the context--by presenting the set of documents retrieved after applying one or more filters (e.g. [PMB96]), ranking the hits according to some evaluation measure (e.g. AltaVista), or clustering them with respect to some similarity criterion (e.g. [BH99]). While a clustering approach groups the documents according to common keywords, there is usually no mechanism to discriminate the respective aspects of a certain topic discussed in the various documents. A first approach to such a multi-aspect information gathering system is Point-Counterpoint [KBBH99] which tries to detect documents representing the pro and cons of some topic. Instead of applying advanced message-extraction and reasoning methods, however, it relies on a list containing the names of persons known to represent a particular standpoint and tries to find documents containing both these names and information regarding the topic under consideration.
When considering multiple aspects of some topic, advanced visualization and exploration techniques have to be provided to the user. Graphical interfaces like the similarity maps introduced in [Tat99] might provide the basis for an intuitive, system-supported manual filtering and exploration process.
There is an emerging awareness that providing solutions to multi-aspect information mining requires that there be a machine understandable semantics for some or all of the information presented in the WWW. Achieving such a semantics requires:
Modern Knowledge Representation and Knowledge Engineering advocates the use of explicit ontologies [vHF99]. In the area of knowledge-based systems ontologies have been developed for structuring and reusing large bodies of knowledge [BFP98]. Ontologies are consensual and formal specifications of a vocabulary used to describe a specific domain. Ontologies can be used to describe the semantic structure of much more complex objects than common databases and are therefore well-suited for describing heterogeneous, distributed and semistructured information sources. A number of projects rely on such notions. On2broker [FAD+99] provides brokering services to improve access to heterogeneous, distributed and semistructured information sources as they are presented in the World Wide Web. It relies on the use of ontologies to make explicit the semantics of web pages, to formulate queries and to derive answers for them. On2broker processes information sources and content descriptions in HTML, XML [MM99], and RDF [LS99]. The system extends the representation and enquiry options in the World Wide Web and enables intelligent services. SHOE [LSRH97] is a small extension to HTML which allows web page authors to annotate their web documents with machine-readable knowledge. SHOE and On2broker use ontologies for information mediation focusing on the integration of HTML sources distributed throughout the World Wide Web.
There are different tools that use such terminologies to provide support in finding, accessing, presenting and maintaining information sources. The Ontology Server Ontolingua [FFR97] is the best known environment for building ontologies. It is an interactive environment especially useful for updating, maintaining and browsing ontologies. Ontologies built in Ontolingua use the Frame Ontology [Gru93] which is written in the Knowledge Interchange Format (KIF) [GF92]. Ontolingua ontologies can be translated to different languages. Further main tools for building ontologies are: Ontosaurus [SPKR97], ODE [FGPPP99], and Tadzebao and WebOnto [Dom98].
HERMES [Hoe98] is a system for semantically integrating different and possibly heterogeneous information sources and reasoning systems. This is accomplishes by executing programs, called mediators, written in the HERMES system. Mediators [WG97] are guidelines of how information from different sources will be combined and integrated. Infomaster [GKD97] is an information integration system. It provides integrated access to distributed, heterogeneous information sources. An essential feature of Infomaster is its emphasis on semantic information processing. Infomaster integrates only structured information sources. This restriction enables Infomaster to process the information in these sources in a semantic fashion. The Information Manifold system [LRO96b] provides uniform access to multiple structured information sources on the World Wide Web. It frees the user form having to find the information sources that are relevant to a given query, access each source separately, and manually combine information from multiple sources. The system contains explicit descriptions of the contents of the information sources. Given a query, the system uses the descriptions to determine which sources are relevant, send the appropriate sub-queries to the relevant sources, and combine information from multiple sources to answer the user query.
A further interesting approach is the idea of ontogroup [FES97]. Like a news group, it is based on a group of people who are joined by a common interest and some agreement as to how to look at their topic. An ontology can be used by such a group to express this common ground and to annotate their information documents. A further approach for the integration of heterogeneous knowledge sources is the use of shared ontologies. Concepts can be shared between different resources if an appropriate mapping function can be found that translates a concept understood by one resource into a concept that is understood by another resource. This is the minimal requirement for two resources to share knowledge [MKSI96]. Many architectures to integrate resources comprise a single shared ontology, an example is given by InfoSleuth [BBB+97]. In contrast to the approach in which all resources share one body of knowledge is the approach which locates shared knowledge in multiple but smaller shared ontologies. This approach is referred to as ontology-based resource clustering [Sha97], [VT99]. The use of standard object modeling techniques is an alternative for the representation of ontologies [CP99].
An excellent survey of database techniques applied to the World Wide Web is provided by [FLM98].
Machine learning techniques can be used that exploit ontologies to automatically classify textual information. [CDF+98].
With new communication technologies like WebTV and portable communication devices, a new class of information applications comes into existence. In a collaborative browsing environment a number of users try to satisfy their information needs simultaneously. As a consequence the information system has to take into account the preferences of several persons (see [LDV99] and [PAC99] for a possible application in PDA-based information access during a museum visit).
As computers become more and more ubiquitous, the group of persons having access to them changes from computer experts to a wide range including naive users. Programming by demonstration (PBD) is a paradigm that is intended to allow even computer illiterates the customization of a system to their personal needs and the extension of a system's operational capabilities with very simple means [Cyp93]. [Mau94] discusses the basic aspects to be considered in communicating with an instructible agent (e.g. how to structure the input samples and bias the agent appropriately), thus extending the so-called learner's ``felicity conditions'' described in [Van87]. Besides the work described in [BD99a], recent advances include the robust learning of patterns in text [Lie99]--by enabling the agent to acquire a grammar of relevant syntactic categories--and the formal characterization of PBD as inductive learning [LW99].
The generation of dialogues between multiple virtual presenters is a complex endeavor which requires research in a variety of disciplines including computer science, sociology, psychology, dramaturgy, and art and design. In this section, we will restrict ourselves to related work done in the intelligent user interfaces and natural language communities.
A number of research projects has discovered lifelike agents as a new means of computer-based presentation.
Applications similar to PPP and AiA were described by Noma and Badler [NB97] who developed a virtual human-like presenter based on the Jack Software and by Thalmann and Kalra [TK95] who produced some animation sequences for a virtual character acting as a television presenter. While the production of animation sequences for the TV presenter requires a lot of manual effort, the Jack presenter receives input at a higher level of abstraction. Essentially, this input consists of text to be uttered by the presenter and commands, such as pointing and rejecting, which refer to the presenter's body language. Nevertheless, the human author still has to specify the presentation script while this process was automated in the PPP and AiA systems. In contrast to MIAU, both systems employ just one agent for presenting information.
The Agneta & Frida system [Per99] incorporates narratives into a web environment by placing two characters on the user's desktop. These characters watch the user during the browsing process and make comments on the visited web pages. In contrast to the approach followed in MIAU, the system relies on pre-authored scripts and no generative mechanism is employed. Consequently, the system operates on predefined web pages only.
Cassell and colleagues (cf. [CPB+94]) automatically generate and animate dialogues between a bank teller and a bank employee with appropriate synchronized speech, intonation, facial expressions and hand gestures. However, their focus is on the communicative function of an utterance and not on the personality and the emotions of the single speakers. Furthermore, they don't aim at conveying information from different points of view, but restrict themselves to a question-answering dialogue between the two animated agents.
Mr. Bengo [NHA+97] is a disputation system with three agents: a judge, a prosecutor and an attorney which is controlled by the user. The prosecutor and the attorney discuss the interpretation of legal rules. Finally, the judge decides on the winner. The system is noteworthy because it includes a full multimodal interface consisting of components for the recognition and synthesis of speech and facial displays. The virtual agents are able to exhibit some basic emotions, such as anger, sadness and surprise, by means of facial expressions. However, they do not rely on any other means, such as linguistic style, to convey personality or emotions.
Hayes-Roth and colleagues have implemented several scenarios following the metaphor of a virtual theater (e.g., see [HRGH97]). Their characters are not directly associated with a specific personality. Instead, they are assigned a role and have to express a personality which is in agreement with this role. A key concept of their approach is improvisation. That is characters spontaneously and cooperatively work out the details of a story at performance time taking into account the constraints of directions either coming from the system or a human user. Even though the main focus of the work by Hayes-Roth and colleagues was not the communication of information by means of performances, the metaphor of a virtual theater can be employed in presentation scenarios as well.
The benefit of agent teams has also been recognized by developers of tutoring systems. For instance, Rickel and Johnson extended their one-on-one learning environment by additional virtual humans which may serve as instructors or substitute missing team members [RJ98].
Argumentation is an essential part of almost any dialogue. There has been a great deal of work on formal frameworks of argumentation and the generation of argumentative discourse (for instance, see [JMZZ96], Part II: Argumentation, for a representative collection of papers in this area).
Work in the natural language generation community has been concentrating on the selection and organization of arguments as well as their linguistic realization. Maybury [May91] lists a collection of argumentative strategies that have been represented by operators of the AIMI text planner. Marcu [Mar97] discusses linguistic aspects of argumentation and provides a collection of features based on work done in cognitive science that characterize persuasive arguments. Zukerman and colleagues [ZMK98] present an argumentation system called NAG that composes arguments that are persuasive for a particular audience and transforms them into natural language.
The selection and arrangement of arguments has also been addressed by developers of decision support systems. For instance, Gordon and colleagues [GKV96] developed a framework of argumentation which serves as the basis of a web-based mediating system for collaborative decision-making and problem-solving. The general idea behind their work is to enable a public review process by presenting decision-relevant information in an appropriate manner.
Of high relevance to MIAU is the approach by Jameson and colleagues who developed a dialogue system which models non-cooperative dialogues between a used car seller and a buyer (cf. [JSSW89]). The system is able to take on both the role of the seller and the buyer. In the role of the seller, the system tries to build up a usable model of the buyer's interests, in order to anticipate her reactions to the system's future dialogue contributions. In the role of the buyer, the system tries to arrive at a realistic estimation of the car's quality. However, while the objective of Jameson and colleagues is the generation of dialogue contributions which meet the goals of the single agents, the focus of the MIAU project is on the development of animated agents that convey information by giving performances. Furthermore, Jameson and colleagues do not animate their agents and just produce written text. Consequently, they are not able to express human and social qualities, such as emotion and personality, through facial expressions and speech.
Hovy describes one of the first natural language generators which is not only driven by the goal of information delivery, but also considers pragmatic goals, such as conveying the social relationship between speaker and listener, during the generation process (cf. [Hov87]). His generation system PAULINE is able to produce a number of linguistic variants in dependency of parameters, such as the tone of interaction, the speakers opinion, and the available time.
Walker and colleagues express personality and emotions not only by the semantic content and the syntactic form of utterances, but also by acoustic realization (cf. [WCW97]) (Walker et al. 1997) drawing upon Cahn's pioneering work on the synthesis of affective speech (cf. [Cah90]).
Recent work in the area of animated agents considers the full range of communicative behaviors including not only linguistic style, but also body gestures and facial expressions.
Ball and Breese present a bidirectional model of personality and emotion based on Bayesian networks (cf. [BB98]). The idea is to treat personality and emotion as unobservable variables in such networks and to construct model dependencies between such unobservable variables and observable quantities, such as linguistic style and facial expressions. The approach is noteworthy since it represents a uniform mechanism for both the diagnosis and the expression of emotions and personality which can be easily extended and modified. Furthermore, it accounts for the uncertainty that is characteristic of this domain.
Cassell and colleagues follow a communication-theoretic approach and present an architecture based on discourse functions (cf. [CBB+98]). The main idea behind their approach is to interpret and generate conversational behaviors in terms of the conversational functions they have to fulfill in a dialogue. A similar approach has been taken by Pelachaud and Poggi, but has been mainly concentrating on the generation of facial displays (cf. [PP98]).
During the PAN project, part of the designated MIAU project members have been active in the development of personalized information agents. A number of publications and presentations at prestigious research institutes, conferences, and workshops indicates the group's international reputation.
In the field of intelligent information integration the newly introduced TrIAs architecture for trainable information assistants provided the first framework for the collaboration between user and information agents during the information gathering process (cf. e.g. [BD99b]).
Just like TrIAs, the InfoBean approach to the configuration of personalized information systems builds upon the idea of generating information extraction wrappers using programming by demonstration techniques (see [BD99a]) the robustness of which outperforms that of competitor systems.
The capitalization of previous work will concentrate on the following aspects:
The first project the group conducted in this area was the PPP project which generated multimedia help instructions presented by an animated agent, the so-called PPP Persona (cf. [RAM97]). Following a speech-act theoretic view, we considered the presentation of multimedia material as a plan-based activity and implemented a goal-driven, top-down planning approach to automatically generate directives for the Persona. The planning component receives as input a communicative goal (for instance, the user should be able to localize the internal parts of a modem) and a set of generation parameters, such as target group, presentation objective, resource limitations, and target language. The task of the component is to select parts of a knowledge base and to transform them into a multimedia presentation structure. Whereas the root node of such a presentation structure corresponds to a more or less complex communicative goal, such as describing a technical device, the leaf nodes are elementary retrieval or generation acts, currently for text, graphics, animations and gestures. Design knowledge is represented by so-called presentation strategies which encode knowledge on: (1) how to select relevant content, (2) how to structure selected content, and finally (3) which medium to use for conveying a content.
In order to cope with the dynamic nature of presentations made by an animated agent, several extensions of the original planning approach became necessary (cf. [AR96]):
The realization of the Persona Server followed the client/server paradigm; i.e, client applications can send requests for the execution of presentation tasks to the server (cf. [RAM97]). However, to ensure that the Persona exhibits life-like qualities, the Persona Server enables not only the execution of presentation tasks, but also implements a basic behavior independent of the applications it serves. This basic behavior comprises: reactive behaviors on sensed events, idle-time acts and low-level navigation acts. The Persona's behavior is coordinated by a so-called behavior monitor which determines the next action to be executed and decomposes it into elementary postures. These postures are forwarded to a character composer which selects the corresponding frames (video frames or drawn images) from an indexed data-base, and forwards the display commands to the window system.
In the AiA Project, we redesigned the Persona server component in order to serve WWW applications, too. The principle was to pack a web page with the selected media objects and a presentation runtime engine implemented as a Java applet which displays the media objects according to the layout specification (cf. [ARM98]). An important characteristics of our presentations is that they are not just played back, but have a branching structure which allows the user to choose between different possibilities of navigation. That is, the course of a presentation changes at runtime depending on user interactions. To enable this, we have combined behavior planning for life-like characters with concepts from hypermedia authoring such as timeline structures and navigation graphs. With the Persona-Enabling Toolkit PET, we provided a software package for the creation of lifelike characters which can be easily integrated into web interfaces. Based on this toolkit, they developed a number of innovative web applications, such as virtual shopping assistants or travel guides.
The novelty of PPP and AiA are that presentation scripts and navigation structures are not stored in advance, but generated automatically from pre-authored documents fragments and items stored in a knowledge base (cf. [ARM99]).
Our research on animated interface agents was motivated by the assumption that they make man-machine communication more effective. To investigate whether this effect indeed holds if we compare persona conditions with no-persona conditions and to see whether it extends to objective measures rather than just subjective measures, we performed a psychological experiment.
In this experiment, we tested the effect of the presence of our PPP persona with respect to the user's understanding, recall, and attitudes. 28 subjects were shown web-based presentations with two different types of content. In the experimental condition, the presentations were done by a speaking and gesturing PPP persona. In the control condition, the (audiovisual) information presented was exactly the same, except that there was no persona and all gesturing was replaced by pointing arrows. After the presentations, the subjects were asked comprehension and recall questions and subsequently provided with a questionnaire that measured their attitudes regarding the system and PPP persona. Statistical analyses of the results showed that there was no effect on comprehension or recall. However, analysis of the data on the subjects' attitudes indeed revealed a significant positive effect of persona: Subjects who had seen presentations guided by persona found the presentations themselves and the corresponding tests less difficult than subjects who had seen presentations without persona. In addition, subjects found these presentations significantly more entertaining (cf. [MAM98]).
In a follow-up study, we investigated whether the subjective persona-effect could be found to extend even toward an increased trustworthiness of the information presented by a lifelike character. In this study, subjects had to perform a navigation task. Subjects were in turn assisted in navigation by one of four agents: one was invisible and merely gave textual recommendations as to how to proceed with the task; the second presented these recommendations acoustically; the third was a speaking cartoon-style agent; and the fourth was a speaking agent based on video-images of a real person. We hypothesized that the embodied agents would appear more convincing or believable and that the subjects would therefore follow the agents' recommendations more readily. This hypothesis, however, was not supported by the data: We only found numerical differences in the expected direction: the proportion of recommendations actually followed by the subjects dropped off going from video-based, to cartoon-style, audio, and text agents (for further details, see [MAM99]).
These findings suggest, among other things, that merely embodying an interface agent may not be enough: to come across as trustworthy, one may need to model the agent more deeply, for instance, by giving it personality.
Acceptance of system decisions or suggestions crucially depends on the user's ability to understand the rationale behind the system's behavior. In order to provide a means for the user to gain some insight into the system's knowledge base for decision making, interest profiles in MIAU will be represented in terms of the users' individual beliefs, goals, plans, and weightings. This way the user will be able to more concisely specify her own intentions, understand the system's performance (both update operations and explanations of document ratings), and modify the profile contents whenever she feels the need to. The system will apply advanced machine learning algorithms to update the user model and will be able to justify whatever it does.
In particular situations the preferences of more than one user have to be taken into account when searching for relevant information or preparing the information basis for decision making. We will address this problem by developing methods to identify common or complementing interests of user groups on the basis of their individual interest profiles.
Besides an interest profile representing a user's long-term preferences, the context of an information request, e.g. the current task, often plays a crucial role. MIAU will develop an approach that allows the user's activities and the state of the world to be monitored and taken into account in order to anticipate future information needs or adapt the information filtering conditions to specific situations.
The InfoBeans approach developed in PAN was a successful attempt to provide customizable information assistants even to naive users. Programming by demonstration techniques gave them a simple tool to train these agents to achieve certain types of information goals. With the arrival of new technologies like XML the focus of activity of such information assistants--and thus the user's task in training these agents--will shift from the extraction of relatively simple, syntactically defined information concepts to the identification of vaguely defined semantic concepts. MIAU will provide means to extend the existing PBD approach to deal with information on a semantic level, possibly taking into account both individual and global ontologies.
To generate effective performances with multiple agents, we cannot simply multiply an existing character. Rather, characters have to be realized as distinguishable individuals with their own areas of expertise, interest profiles, personalities and audio/visual appearance taking into account their specific task in a given context. MIAU will start from a given set of characters and basic gestures. To indicate the role of a character in a certain setting, MIAU will make use of accessories, such as caps or bags. While the visual appearance of an agent can only be automatically modified to a limited extent, MIAU will allow for the dynamic assignment of goals, attitudes and personality features. In particular, we will investigate which character combinations will more likely lead to the success of a communication.
There are various types of dialogues including debates, panel discussions, chats, interviews, consultation, sales, brokering and tutoring dialogues etc. In MIAU, we will develop a conversational model that accounts for a large variety of dialogue types and communicative situations. To automatically generate dialogues between multiple conversational agents, we will investigate two approaches: agents with scripted behaviors and improvisational actors (see Section 2.2.2). A further level of complexity will be achieved by allowing humans to participate in such a dialogue. This dialogue setting will be investigated in the context of a collaborative browsing environment.
In MIAU, we will investigate how to generate dialogue contributions that reflect the agent's personality and emotional state relying on work done in the DFKI projects Presence, Puppet and CoMMA-COGs. In particular, we will investigate how emotions and personality may be conveyed by facial expressions, body gestures and verbal style which refers to the semantic contents, the syntactic structure and the acoustic realisation of an utterance (see [WCW97]). To consider such parameters, MIAU will enhance the input of the animation modules and the speech synthesizer with additional instructions in an XML-based mark-up language.
Target markets for the application of the MIAU technologies are: online information services and electronic commerce. The current generation of Internet agents for electronic commerce, that are often called shopbots, produce simple comparisons between product offers that are based on a single criterion: the price of a particular product that the user wants to buy. With the new methods developed in the MIAU project, it will become possible to provide the user with customized, multi-dimensional comparisons of products including the price, service quality, maintenance guarantees, shipping conditions etc. In addition, similar products or services that match multiple objectives of the user can be compared and presented in a dialectic way by the MIAU system, so that active decision support is provided. This will drastically increase the value of electronic shopping for the consumer.
The research department "Intelligent User Interfaces" of DFKI conducts already several funded industrial projects in this area and works on the key technologies needed for the next generation of net-based application solutions.
A large number of DFKI's industrial partners have already announced strong interest in using the technology to be developed in MIAU. Currently, we are conducting industrial projects on Internet-based shopping assistants and insurance consultants funded by a large German mail order house and a German-American car producer.
There will be a close cooperation between the MIAU project and other DFKI projects.
Links to shareholders and projects outside of DFKI have already been established during the AiA and PAN projects. They will continue and provide useful opportunities for cooperation.
The internally funded PRESENCE project will use lifelike characters as virtual receptionist/infotainers/accompanying guides for visitors to DFKI GmbH. The PRESENCE project addresses a number of specific problem areas which are of relevance to the MIAU project: (a) flexible integration of multiple input (speech, mouse, keyboard and touchscreen) and output (text, pictures, videos and speech) devices; (b) the development of a high-level descriptive language for character definition, based on personality traits to allow easy customization of the agent; (c) explore the possibility of tailoring the agent-user interaction to an individual user by inferring the user's affective stage. However, unlike the MIAU project, PRESENCE will just employ a single conversational agent.
The EU-funded i3-ese project Puppet explores new forms of learning by the development and evaluation of novel interactive environments based on the metaphor of a virtual theatre. Deploying user-controlled avatars and synthetic characters in the child's own play production, the children have to distinguish and master multiple roles in their interaction with the system, e.g. that of a director, an actor and an audience with the main activities producing, enacting and reflecting respectively. Within this process the children should gain a basic understanding on how different emotional states can change or modify a character's behavior and how physical and verbal actions in social interaction can induce emotions in others.
The BMBF-funded project SMARTCOM aims at making man-machine communication more intuitive and natural by analysing and generating multiple modalities in a synergistic manner. Of particular interest to MIAU is subproject 4: ``Generation and Multimodal Media Design'', which will concentrate on written and spoken language generation, speech synthesis, adaptive graphics design, the generation of gestures, presentation and display management.
The objective of the BMBF-funded project CoMMA-COGs (Cooperative Man Machine Architectures - Cognitive Architecture for Social Agentsproject) is the development of integrated architectures for multi-agent systems by exploiting relevant research in Cognitive Science. Of particular interest to MIAU are the investigation of resource-bounded constructs to support the representation of lifelike characters in entertainment software such as interactive drama and games.
We plan close co-operation with:
The MIAU project is decomposed into six major workpackages.
The overall architecture of the MIAU system has been specified which consists of a multi-aspect decision support module and a presentation planning module.
A theoretical concept for the acquisition of transparent user models and the integration of context models into the information mining process has been elaborated.
The influence of personality and emotions on linguistic style, facial expressions and body gestures has been studied.
First demonstrator systems for the multi-aspect information mining system and the presentation system have been implemented.
The information mining system provides techniques for mapping user preferences onto documents and classifying the information retrieved.
The presentation system is able to plan the propositional contents and the structure of simulated dialogues considering the personalities and emotions of the single agents.
All components have been fully integrated into the MIAU prototype and been tested by means of a common application scenario. Human agents may participate in a dialogue with artificial agents. The system is fully documented in various reports. Presentation styles with different character settings have been evaluated.
This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 miau-dlr.
The translation was initiated by Elisabeth Andre on Fri Aug 11 14:03:06 MET DST 2000