Wolfgang Broll, Eckhard Meier, Thomas Schardt
GMD - German National
Research Center for Information Technology
Institute for
Applied Information Technology (FIT)
http://orgwis.gmd.de/projects/VR
In existing shared virtual environments the
representation of participants is often limited to static avatars at
the user's viewpoint. Some systems additionally allow users to perform
a set of predefined actions on their avatar. This approach however,
does not seem to be suitable to represent non-verbal communication.
Additionally shared virtual worlds often do not attract new users if
they are not already heavily populated.
In this paper we will show how we try to
overcome these problems by enhancing the user's representation through
tangible interfaces and populating shared virtual worlds with user
representatives, even if the user is currently not participating at the
virtual world.
Several shared virtual environments based on the
Internet have been established through the last years [4] [9]
. Most of them however, use a very simple and rather static
representation of their participants by avatars. These characters are
simply placed at the current viewpoint of the particular user. In such
a world the users role is basically reduced to that of an observer.
People visiting these environments do not really become involved into
these worlds, because their acting capabilities are often limited to
walking around, following internet links, and talking to other people.
Sometimes simple activities are available, but often are limited to a
small, restricted area of the world.
The lack of interaction capabilities makes it
impossible for the participants of shared virtual environments to
become an active and constructive member of these worlds. Therefore the
world's main structure as well as the appearance of the avatars is
static and fixed over time. To encounter the problem of inexpressive
representations of current visitors, some virtual environments try to
add additional signs of life to their avatars. In AlphaWorld [1] and in Sony's Community Place [9] for instance, text typed in the chat
window, is represented above the corresponding avatar. Other
environments such as OnLive Traveller provide a basic speech-to-lips
synchronization and a number of predefined expressions or body
gestures. Most participants however, do not use such predefined
actions, since they have to be activated explicitly (e.g. by pressing
an appropriate button), rather than being captured from the real body
motions and facial expressions of the user.
Another phenomena we can observe is that shared
virtual worlds often grow lonely. Similar to the real world, users tend
to group together to communicate. But why should people meet in virtual
worlds, if there is no interesting theme to talk about? Most internet
based environments do not serve a particular purpose. Users cannot
profit of the virtual environment's structure, because no real task or
goal is embedded into these scenes. People are forced to hang around
without any chance of action, just waiting for other users to join the
world. This may explain, why such environments are often populated
rarely and finally turn into virtual deserts.
This does not necessarily have to be true as
demonstrated by shared gaming environments such as Ultima Online [12]. In this type of role
playing games there exists a global task to perform (most often, it is
to fight the evil). This goal is embedded into a complex, consistent
scenery in which players are able to control their avatars. The main
aspect of these games is to reach certain goals, develop strategies,
extend capabilities and so on. There is always something to do or to
discover in these worlds. We can observe that participants of these
games in general become an active part of the game's story by
controlling their avatars in a dedicated way. This may explain why
users spend several hours in a shared gaming environment, but often
leave other worlds after a couple of minutes.
Similar to gaming environments, our approach
takes advantage of this kind of persistent, task oriented virtual
worlds. Within a global, context dependent scenario, we are able to
enhance static user representations by active avatar components. But
instead of concentrating on the virtual world's internal state, it is
our goal to emphasize these environments by real world user activities.
These activities are continuously mapped onto corresponding virtual
characters representing the user in an appropriate, context sensitive
way. Further more, the combination of characters and real world events
allows us to control avatars in the virtual environment and keep them
alive, even if the actual persons currently do not actively participate
within the world. Thus users acting in their usual working environment
(e.g. editing a document, reading a book, walking to another persons
office, etc.) can be represented appropriately by their virtual
counterparts. To realize this approach simple sensor input as well as a
number of system events have to be monitored and mapped to virtual
world avatar or object actions.
In the second section of this paper we will
describe the infrastructure used to map external events and sensor
input to object or avatar specific symbolic actions. In the third
section we will present two sample scenarios.
In this section we will give a short
introduction of the basic infrastructure used to realize symbolic
actions in shared virtual environments. Our sample scenarios are based
on SmallTool - a toolkit for the development of shared multi-user VR
applications. We will further present our approach to control the
behavior of avatars and characters within a virtual environment. The
support provided by our approach can be subdivided into two areas:
SmallTool [5]
multi-user VR toolkit is based on a set of libraries to minimize the
necessary effort to create distributed virtual environments populated
by users and characters. The main parts of SmallTool include:
The extended VRML library enables us to parse
and render 3D objects based on the ISO standard VRML'97 [2]. It provides additional support for
the representation of users by avatars. In addition to theses features
the EV library supports the synchronization of shared scenes.
The distributed worlds transfer and
communication protocol DWTP provides a high level application network
interface adapted to the special needs of shared virtual environments
on the Internet. In addition to its client interface, it provides a
number of services which can be used as daemons or within application
servers to realize scalability, reliability and persistence.
The device independent communication interface
finally supports the easy connection of new innovative I/O devices via
the Internet. Each I/O device is connected to a DICI server part, which
makes the device available to all or selected hosts on the Internet.
Applications which want to use these services (either receiving input
data or sending output data) simply include the DICI client interface.
The services can then be used by specifying the Internet address and
the name of the requested service. We have realized DICI servers for
6DOF magnetic trackers and the MOVY tracking system [8]. MOVY is a wireless inertial tracker developed in
our institute. Compared to magnetic tracking devices it has a wider
operation area and is not influenced by metal or electric fields.
The SmallTool libraries are currently available
for Windows95/98/NT, Linux and some UNIX flavors (IRIX, SOLARIS).
We have built some sample applications on top of
the SmallTool libraries. Our main application is SmallView, a
multi-user VRML browser. For rapid prototyping of new 3D applications
SmallView provides an application scripting interface.
The main purpose of symbolic actions is to map
external events on changes within a virtual world. Events based on user
activities will usually be mapped to actions of the user's avatar or
character. They may however also be used to map arbitrary events to
changes of objects in a synthetic virtual environment.
We distinguish between the avatar and the
character of a user: The avatar visualizes the viewpoint of a remote
user to all other users, giving the participant a virtual
representation within a shared multi-user environment. In contrast to
avatars, a character can be used to represent particular activities of
a user even when the user is not participating at the virtual
environment. Thus the representation by a character can even be used in
single-user virtual worlds.
We have realized a prototype implementation of a
symbolic action module, which allows us to map external events on user
or scene specific actions. These actions may include e.g. animations,
object changes or sound. The symbolic actions can be configured by
defining a simple mapping between the received external event data and
the event recipient within the 3D scene. Additionally the internal
events issued can be based on the previous external event (e.g. for
stopping the last action). An external event may even be mapped to
several internal events. Internal events can be issued concurrently or
in sequence. An examples for a concurrent action is a character walking
from one room into another room: the walking animation which moves the
arms and legs of the character has to be performed concurrently to the
movement of the whole character body to the new location. Other action
such as sitting down and reading a book require two or more events to
be issued in sequence. Often a certain gap between two actions is
required or an action needs a minimum time to be performed (e.g. when
changing locations). Time out values can be used to specify the
duration of actions.
In our prototype the symbolic action module is
currently limited to match external events to VRML events. It is not
yet possible to query the state of objects in the VRML scene or to find
e.g. the nearest object of a particular type. The current version is
completely implemented as a SmallView application script. Future
releases will be based on an additional library integrated into
SmallTool toolkit.
The current VRML'97 standard does not provide
built-in support for state based behavior. The realization of such
behaviors requires the use of scripting languages such as ECMAScript
(formerly known as JavaScript) or Java. In VRML such scripts can be
part of the scene graph. Another possibility provided by some VRML
browsers is the External Authoring Interface (EAI) [10]. It allows application developers to influence
the 3D scene by an external Java application.
Our SmallView browsers provides the capability
to load and run several external scripting applications. We currently
use an extended C++ version (instead of Java) of the standard VRML EAI
to transfer the events issued from the symbolic action module into the
VRML scene graph. Animations sequences for particular objects types
(defined by VRML prototypes) are started by sending appropriate time
events to the corresponding VRML time sensor nodes. To achieve this,
objects are defined by VRML prototypes. An object which wants to
provide an external behavior interface, has to define an event input
for each action. This mechanism allows us to define behaviors which are
independent of the spatial context such as walking, shaking the head or
jumping. In addition to the VRML standard, our implementation allows us
the inheritance of prototypes. By overloading the interface of
prototypes in sub-classes, this can be used to simulate dynamic binding
(similar to the concept used in VRML++ [6].
Additionally this concept provides us with a mechanism required to
realize context sensitive behavior such as walk into the dining room or
sit on chair. To identify the type of objects however, they would have
to be inherited from the appropriate base object (e.g. a dining room
from a room).
In addition to this mechanism, our realization
allows us to add arbitrary input devices to VRML scenes. In addition to
the mapping provided by the symbolic action module, named external
events can be grabbed from objects within in the 3D scene. Once caught
by a scene object, the event data can be forwarded to other parts of
the scene by the standard routing mechanism of VRML. This provides the
possibility e.g. to connect a 6DOF tracking system to the limbs of your
avatar.
More complex representations of body motion
would require kinematics currently not possible within standard VRML.
Simple extensions to the specification of object transformations within
the scene could be used to specify joints. By limiting the direction
and amount of translations, or the axis and angle of rotations, inverse
kinematics can be used to calculate the intermediate transformations.
In this section of the paper we will present two
sample scenarios which have been realized within SmallView by the
mechanisms presented in the second section. The first scenario shows
how mutual awareness in virtual teams or between remote users (e.g.
tele-workers) can be increased by symbolic character representations.
The second part presents our approach to enhance the communication
between distributed users represented by avatars by providing
additional feedback mechanisms.
In many companies people working together are
distributed over several rooms (or even floors or buildings). Some
colleagues may even work from home. Since most of the work has to be
done individually, colleagues get together only from time to time to
synchronize their work, decide what to do next, exchange information,
or simply talk to each other.
Back in their offices these coworkers only have
the state of information of the last meeting, although a lot of things
could have happened since this meeting: a shared document could have
been finished or manipulated; a colleague could have added some
information in a shared workspace, or some people may decide to have a
spontaneous meeting in the coffee room. Informing every member of a
working group of such events produces a communication overhead: Having
modified or replaced a shared document, one has to send an email to all
interested people. Spontaneous discussions are interrupted, because
missing colleagues have to be invited explicitly.
The goal of our system is to capture such events
and inform other people directly. Existing approaches to this problem
send an email or pop up a window on the user's desktop. This however,
seems to be too intrusive to achieve a peripheral awareness of the
environment. In our opinion it is important to provide additional
peripheral awareness similar to BT's Contact Space or Form Meeting
Space [7]. Our approach therefore
is based on a comprehensive virtual workgroup scenario including the
representation of users by active avatars, appropriate visualizations
of the users' real working environment (e.g. offices, coffee room) as
well as virtual workspaces. These virtual workspaces may be used to
represent teams or subject related data. Ideally the 3D representation
should be displayed on a separate screen rather than in an additional
window on the user's desktop. This allows the user to concentrate on
his work while staying aware of the overall situation in his workgroup.
Within these workgroup scenarios each member is
represented by an animated avatar. The avatar's behavior represents the
activity of it's owner symbolically: If an employee opens a shared
workspace and fetches a document, his avatar moves to the room
representing this workspace, takes a paper out of a bookshelf, moves
back into his office, sits down and modifies this document. If two
users are talking in the coffee room, their avatars also move to the
corresponding virtual counterpart.
In our prototype we use this system to visualize
the activity of users in BSCW workspaces [3]. Currently the system recognizes only a small set
of user activities and maps them to symbolic actions of the appropriate
avatar (see figure 1) :
To recognize the user activities a set of
hardware and software sensors is used. Browsing through different
workspaces as well as the up- and downloading of documents are software
events which can be captured within the BSCW. Editing a downloaded
document can be recognized via appropriate software sensors (see
section 3.2). To recognize whether the user is still working in his
office, moving over the floor or is standing in the coffee room having
a talk, additional sensors including the MOVY inertial tracker,
webcams, and light sensors are used. The sensor data is either used to
create events for the NESSIE awareness environment [11], which can be then be received by the SmallView
browser, or the data is received directly from a DICI server connected
to the sensor.
The 3D-representation of the daily work allows
remote users to achieve almost the same peripheral awareness as if
working within the same office. Additionally this representation can be
used to populate distributed virtual worlds. Users can be present and
accessible in the 3D world, even while doing their regular work.
Whereas the first scenario focused on peripheral
awareness and the population of shared virtual worlds, the second
scenario enhances the possibilities of communication in virtual worlds
between distributed users.
One problem of most existing shared virtual
environments representing users by avatars is, that the avatars do not
behave naturally. The reason is not the individual representation
(which can be arbitrary) but the lack of information about the user's
mimic and gesture which is essential for most types of communication
and especially for cooperation. In many situations like discussions or
representations non-verbal feedback of the auditors is very important.
Consider a teacher who tries to give an explanation to the class: until
the teacher has finished the explanation the class usually listens.
During that period the teacher does not get any verbal feedback from
the class, but he gets a continuous stream of informations via
non-verbal feedback. The class may be interested or lackadaisical, may
agree or disagree or wonder about the explanation. The feedback of the
class helps the teacher to adjust his explanation. For the class the
mimic and gesture of the teacher is important, because they express or
enforce important parts.
Some virtual environments allow users to express
their mood by their avatar. However, this usually requires the user to
explicitly activate an appropriate expression of his avatar (e.g. by
pressing a button). This solution has two major drawbacks: first,
people do not use it very often, since it is not intuitive. Second, the
duration of avatar's expression (either predefined or until changed
again) usually does not match the length of the real mood.
To represent body language, gestures, facial
expression and mood for the representation of users by avatars, we
combine sensors and tangible interfaces with symbolic avatar actions.
The main aspect for the selection of the sensors is to get a maximum of
information about the user but without disturbing or distracting the
user while he or she is working or moving around. To reach this goal
both hardware and software sensors can be used.
A hardware sensor used to detect user actions is
the MOVY tracking system described earlier. Different types of user
actions can be tracked by MOVY depending on the location of the sensor.
Figure 2 shows two users in shared virtual world represented
by penguin avatars. The arm movements of the users are tracked by MOVY
sensors and used to animate the wings of the penguins. The avatar
gestures are transmitted over the network to enhance the expression and
feedback during the communication of remote avatars.
Another hardware sensor is a camera used to
realize the recognition of facial expressions. This requires that users
provide a set of typical facial expressions for each mood to be
visualized by the avatar before participating at a distributed meeting.
Basic facial expressions are neutral, laughing, wondering, and anger.
After a user has entered a shared virtual world, his face is
continuously captured by a monitor mounted camera. The captured image
is then compared to the pre-recorded facial expressions. If one of
these expressions is recognized, the appropriate action of the user's
avatar (usually a short animation) can be invoked (see figure 3). By default the predefined
neutral expression is displayed. In addition to the recognition of the
user's facial expression or mood, the absence of the user can easily be
detected. This would simply require an additional reference image,
showing the empty office environment as captured by the camera.
Software sensors can be used to detect the
activity of the keyboard, mouse and the currently activated application
window which give us information about the attention and the focus of
the user. A background process called the SystemSpy captures
continuously all input events (keyboard, mouse, activating a window,
etc.). By that it detects which applications are currently used and if
the user is active or idle. An idle user can be represented by a
sleeping avatar whereas an inattentive user (who is working in another
application but still present) may be represented by an avatar who is
looking around.
All incoming data of the available hardware and
software sensors is processed on the local host on which the sensor is
located. The pre-processed data is made available to the VR application
by the DICI client-server architecture. In order to create a natural
behavior of the represented user the captured data has to be weighted
depending on the individual input device. Input representing activity
(e.g. the use of the keyboard or mouse, use of the VR application) are
weighted higher than events representing inactivity (switching between
several applications, bored facial expression) and fast sensors are
weighted higher than slow sensors.
The recognition of user activities by a set of
sensors seems to require a rather high computational and personal
effort compared to the use video streams for the transmission of facial
expressions, gestures and body motions. This method however, does not
require a high bandwidth and thus can be used even over modem
connections. Additionally this approach also allows us to provide a
certain amount of privacy to the user (he or she might even stay
anonymous).
In this paper we presented our approach to
enhance the avatar representation of users in shared virtual
environments. We additionally showed how symbolic representations of
user actions can be used to populate virtual worlds, providing an
interesting and useful scenario for members of virtual teams or remote
co-workers.
In our future work we will use additional
pre-processed third party motion data to provide a comprehensive
library of body language and object dynamics descriptions. This
approach will be based on new interfaces to existing body motion
libraries. Additionally we will further enhance the external interface
to VRML and its capabilities to handle kinematics.
-
ActiveWorlds.
http://www.activeworlds.com
-
L.A. Ames, D.R. Nadeau, and J.L. Moreland: The VRML
2.0 sourcebook, John Wiley & Sons, New York, 1997.
-
R. Bentley, W. Appelt, U. Busbach, E. Hinrichs, D.
Kerr, K. Sikkel, J. Trevor, G. Woetzel: Basic Support for Cooperative
Work on the World-Wide Web. International Journal of Human Computer
Studies: Special Issue on Innovative Applications of the World-Wide Web
(1997), No. 6, pp. 827-846.
-
Blaxxun Interactive
Inc. http://www.blaxxun.com
-
W. Broll: SmallTool - A Toolkit for Realizing Shared
Virtual Environments on the Internet. Distributed Systems Engineering,
Special Issue on Distributed Virtual Environments (1998), No. 5, pp.
118-128.
-
S. Diehl: VRML++: A Language for Object-Oriented
Virtual Reality Models, in Proceedings of the 24th International
Conference on Technology of Object-Oriented Languages and Systems
(TOOLS Asia'97), Beijing, China 1997.
-
A. McGrath: The Forum. SIGGROUP Bulletin, Vol. 19,
No. 3, December 1998, AC M Press.
-
P.
Henne: MOVY - Wireless Sensor for Gestures, Rotation and Movement.
ERCIM News No. 36, January 1999.
-
R. Lea, Y. Honda, K. Matsuda, and S. Matsuda.
Community Place: Architecture and Performance, in Proceedings of the
VRML'97 Symposium, ACM SIGGRAPH, 1997, pp. 41-49.
-
C.
Marrin, and J. Couch: VRML external authoring interface (EAI)
reference, Proposal for a VRML 2.0 Informative Annex.
http://www.web3d.org/WorkingGroups/vrml-eai/
-
NESSIE
Awareness Environment. http://orgwis.gmd.de/projects/nessie
-
Ultima Online.
http://www.owo.com