Michael Kipp
Gesture Generation by Imitation

abstract     how to get it     my publications     homepage


I published my PhD disseration in December 2004. It includes a description of the architecture of the ANVIL system, a video annotation research tool. It also deals with the transcription of gesture and speech for the purpose of gesture generation. Furthermore, it shows how such transcriptions can be analyzed and used to generate gestures for animated computer characters. A more detailed abstract is below.

Note that my thesis is not an ANVIL manual. You can get the ANVIL manual for free here.

The full bibliographic reference is:

Michael Kipp, "Gesture Generation by Imitation - From Human Behavior to Computer Character Animation", Boca Raton, Florida: Dissertation.com, December 2004.


In an effort to extend traditional human-computer interfaces research has introduced embodied agents to utilize the modalities of everyday human-human communication, like facial expression, gestures and body postures. However, giving computer agents a human-like body introduces new challenges. Since human users are very sensitive and critical concerning bodily behavior the agents must act naturally and individually in order to be believable.

This dissertation focuses on gestures. It shows how to generate conversational gestures for an animated embodied agent based on annotated text input. The central idea is to imitate the gestural behavior of a human individual. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. The generation task is solved in three stages: observation, modeling and generation. For observation, the video annotation research tool ANVIL was created. It allows the efficient transcription of gesture, speech and other modalities on multiple layers. ANVIL is application-independent, platform-independent and extensible, making it suitable for a wide variety of research fields. Using ANVIL selected clips from the TV talk show "Das Literarische Quartett" were transcribed, resulting in 1,056 transcribed gestures and a shared lexicon of 68 gestures. For the modeling stage, the NOVALIS module computes individual, probabilistic gesture profiles from the transcriptions, covering aspects like handedness, function and timing. For gesture generation, the NOVA generator produces gestures based on a gesture profile in an overgenerate-and-filter approach and outputs a linear, player-independent XML action script.

How To Get It

Since I published the thesis I cannot put it online for free. However, for $9 you can get the electronic edition (PDF) directly from the publisher: my thesis at Dissertation.com. You can also order a paperback copy (277 pages) of the thesis using one of the links below.

USA Dissertation.com
(Dissertation.com is the publisher, offering both the paperback and the electronic PDF version)

Germany Amazon.de
Austria Amazon.at
UK Amazon.co.uk
Japan Amazon.co.jp
France Amazon.fr
Canada Amazon.ca