DFKI-LT - Annotating meaning of listener vocalizations for speech synthesis

Sathish Chandra Pammi, Marc Schröder
Annotating meaning of listener vocalizations for speech synthesis
2 Prococeedings of ACII 2009, Amsterdam, Netherlands, o.A., 2009
Generation of listener vocalizations is one of the major objectives of emotionally colored conversational speech synthesis. Success in this endeavor depends on the answers to three questions: What kinds of meaning are expressed through listener vocalizations? What form is suitable for a given meaning? And, in what context should which listener vocalizations be produced? In this paper, we address the first of these questions. We present a method to record natural and expressive listener vocalizations for synthesis, and describe our approach to identify a suitable categorical description of the meaning conveyed in the vocalizations. In our data, one actor produces a total of 967 listener vocalizations, in his natural speaking style and three acted emotion-specific personalities. In an open categorization scheme, we find that eleven categories occur on at least 5% of the vocalizations, and that most vocalizations are better described by two or three categories rather than a single one. Furthermore, an annotation of meaning reference, according to Bühler's Organon model, allows us to make interesting observations regarding the listener's own state, his stance towards the interlocutor, and his attitude towards the topic of the conversation.
Files: BibTeX, pammi_schroeder2009b.pdf, pammi_schroeder2009b.pdf