Marc Schröder

person software quality research teaching publications

Research

(outdated)

Emotional Speech Synthesis

One of my major research interests is the expression of emotions in synthesised speech. I have written a review of existing work in the domain, which lead me to the conclusion that current systems and prototypes are not yet in a state interesting for applications. A major shortcoming of existing systems is the impossibility to represent shades of emotions, as well as emotions changing over time. The solution I have proposed is the representation of emotional states by means of emotion dimensions (see below). A first study has shown reliable correlations between emotion dimensions and acoustic parameters relevant for speech synthesis. In my PhD thesis (currently under investigation), I have deepened that analysis and proposed an implementation using the MARY text-to-speech system (see below). A demo of emotional speech synthesis using that approach is available online.

Text-to-Speech Synthesis

I am responsible for the development of the text-to-speech system MARY at DFKI. On the MARY web page, you can learn about the system's basic properties and design. You can also synthesise German text online, and even edit intermediate processing results in order to explore the functioning of individual system modules.

In the NECA project (see below), the MARY TTS system is used for the expressing of emotions in an audiovisual setting.

Emotion Representation

I believe that representing emotions in terms of "basic" emotion categories, such as "anger", "fear", "joy" etc., is not the most useful way to obtain a flexible speech synthesis system capable of expressing emotions. Instead, I argue in favour of emotion dimensions as a simple means for capturing basic properties of the emotional state in a gradual way. The emotion dimensions generally agreed upon as being most basic are "activation" (or "arousal", i.e. the readiness to act in some way) and "evaluation" (or "valence", "pleasure", in terms of positive/negative, liking/disliking). In social interaction settings, a third dimension "power" (or "control", "dominance", the social status) has shown to be useful.

A labelling tool for two emotion dimensions called Feeltrace was developed at Queen's University Belfast by Roddy Cowie and co-workers. It allows for the tracking of a perceived emotional state continuously over time, on the two main emotion dimensions activation and evaluation. You can read a paper presenting FEELTRACE which describes the main design ideas and explains how the tool can be used for database annotation.

Projects

HUMAINE

The European Network of Excellence on emotion research and human-machine interaction, HUMAINE, aims to lay the foundations for European development of systems that can register, model and/or influence human emotional and emotion-related states and processes - 'emotion-oriented systems'. Such systems may be central to future interfaces, but their conceptual underpinnings are not sufficiently advanced to be sure of their real potential or the best way to develop them. One of the reasons is that relevant knowledge is dispersed across many disciplines. HUMAINE brings together leading experts from the key disciplines in a programme designed to achieve intellectual integration.

In HUMAINE, I fulfil the role of "Area Co-ordinator Spread of Excellence", which among other things encompasses my being responsible for the technology and scientific content of the HUMAINE portal, as well as for summer schools held by HUMAINE once a year.

NECA

During 2001-2004, I worked in the EU project NECA, IST-2000-28580, which aims at developing a Net Environment for Embodied, Emotional Conversational Agents. Read a short description of the NECA project from a DFKI perspective, and visit the NECA Homepage.

Within the NECA project, I am involved with the Speech group. We have created two new MBROLA voice databases recorded with three levels of vocal effort, which makes them more emotionally expressive. We have formulated rules for vocal emotion expression and implemented them in the MARY TTS system.

Another aspect of the NECA project that I am connected with is the development of a Rich Represenation Language by means of which the different NECA modules will communicate. This work involves the comparison to existing Markup language such as the Virtual Human Markup Language developed at Curtin University, Australia.

Open source software

I have contributed several pieces of software to the open source landscape, notably: