Evaluating the meaning of synthesized listener vocalizations
Sathish Pammi; Marc Schröder
In: Proceedings of Interspeech 2011. Conference in the Annual Series of Interspeech Events (INTERSPEECH-2011), 12th, August 28-31, Florence, Italy, ISCA, 2011.
Spoken and multimodal dialogue systems start to use listener vocalizations for more natural interaction. In a unit selection framework, using a finite set of recorded listener vocalizations, synthesis quality is high but the acoustic variability is limited. As a result, many combinations of segmental form and intended meaning cannot be synthesized. This paper presents an algorithm in the unit selection domain for increasing the range of vocalizations that can be synthesized with a given set of recordings. We investigate whether the approach makes the synthesized vocalizations convey a meaning closer to the intended meaning, using a pairwise comparison perception test. The results partially confirm the hypothesis, indicating that in many cases, the algorithm makes available more appropriate alternatives to the available set of recorded listener vocalizations.