| person | software quality | research | teaching | publications |
(outdated)
One of my major research interests is the expression of emotions in synthesised speech. I have written a review of existing work in the domain, which lead me to the conclusion that current systems and prototypes are not yet in a state interesting for applications. A major shortcoming of existing systems is the impossibility to represent shades of emotions, as well as emotions changing over time. The solution I have proposed is the representation of emotional states by means of emotion dimensions (see below). A first study has shown reliable correlations between emotion dimensions and acoustic parameters relevant for speech synthesis. In my PhD thesis (currently under investigation), I have deepened that analysis and proposed an implementation using the MARY text-to-speech system (see below). A demo of emotional speech synthesis using that approach is available online.
I am responsible for the development of the text-to-speech system MARY at DFKI. On the MARY web page, you can learn about the system's basic properties and design. You can also synthesise German text online, and even edit intermediate processing results in order to explore the functioning of individual system modules.
In the NECA project (see below), the MARY TTS system is used for the expressing of emotions in an audiovisual setting.
I believe that representing emotions in terms of "basic" emotion categories, such as "anger", "fear", "joy" etc., is not the most useful way to obtain a flexible speech synthesis system capable of expressing emotions. Instead, I argue in favour of emotion dimensions as a simple means for capturing basic properties of the emotional state in a gradual way. The emotion dimensions generally agreed upon as being most basic are "activation" (or "arousal", i.e. the readiness to act in some way) and "evaluation" (or "valence", "pleasure", in terms of positive/negative, liking/disliking). In social interaction settings, a third dimension "power" (or "control", "dominance", the social status) has shown to be useful.
A labelling tool for two emotion dimensions called Feeltrace was developed at Queen's University Belfast by Roddy Cowie and co-workers. It allows for the tracking of a perceived emotional state continuously over time, on the two main emotion dimensions activation and evaluation. You can read a paper presenting FEELTRACE which describes the main design ideas and explains how the tool can be used for database annotation.
During 2001-2004, I worked in the EU project NECA, IST-2000-28580, which aims at developing a Net Environment for Embodied, Emotional Conversational Agents. Read a short description of the NECA project from a DFKI perspective, and visit the NECA Homepage.
Within the NECA project, I am involved with the Speech group. We have created two new MBROLA voice databases recorded with three levels of vocal effort, which makes them more emotionally expressive. We have formulated rules for vocal emotion expression and implemented them in the MARY TTS system.
Another aspect of the NECA project that I am connected with is the development of a Rich Represenation Language by means of which the different NECA modules will communicate. This work involves the comparison to existing Markup language such as the Virtual Human Markup Language developed at Curtin University, Australia.