DFKI-LT - Emotional speech synthesis for emotionally-rich virtual worlds

Marc Schröder
Emotional speech synthesis for emotionally-rich virtual worlds
1 Proc. of Workshop on emotionally rich virtual worlds with emotion synthesis at the 8th International Conference on 3D Web Technology (Web3D), St. Malo, France, 3/2003
This paper aims to give a brief overview of the current state of the art in emotional speech synthesis in view of a multi-modal context. After a brief introduction into the concept of text-to-speech synthesis, two approaches to the expression of emotions in speech synthesis are described. The categorical approach models emotions as discrete categories and is able to provide high-quality emotional speech for a few emotion categories; the dimensional approach uses emotion dimensions such as activation and evaluation to model essential emotional properties, leading to more flexible but less specific expressions. Architectural requirements for an audio-visual integration are outlined. Three examples of demonstrators illustrate the types of applications we currently envisage. Finally, the question of validation of a generation system is formulated, and a direction for the development of possible answers is suggested.
