Evaluating the meaning of synthesized listener vocalizations

Sathish Pammi, Marc Schröder

In: Proceedings of Interspeech 2011. Conference in the Annual Series of Interspeech Events (INTERSPEECH-2011) 12th August 28-31 Florence Italy ISCA 2011.


Spoken and multimodal dialogue systems start to use listener vocalizations for more natural interaction. In a unit selection framework, using a finite set of recorded listener vocalizations, synthesis quality is high but the acoustic variability is limited. As a result, many combinations of segmental form and intended meaning cannot be synthesized. This paper presents an algorithm in the unit selection domain for increasing the range of vocalizations that can be synthesized with a given set of recordings. We investigate whether the approach makes the synthesized vocalizations convey a meaning closer to the intended meaning, using a pairwise comparison perception test. The results partially confirm the hypothesis, indicating that in many cases, the algorithm makes available more appropriate alternatives to the available set of recorded listener vocalizations.


sov-evaluation.pdf (pdf, 384 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence