Voice Quality Interpolation for Emotional Text-To-Speech Synthesis

Oytun Türk, Marc Schröder, Baris Bozkurt, Levent M. Arslan

In: Proc. Interspeech 2005. Conference in the Annual Series of Interspeech Events (INTERSPEECH) Lisbon, Portugal Seiten 797-800 2005.


Synthesizing desired emotions using concatenative algorithms relies on collection of large databases. This paper focuses on the development and assessment of a simple algorithm to interpolate the intended vocal effort in existing databases in order to create new databases with intermediate levels of vocal effort. Three diphone databases in German with soft, modal, and loud voice qualities are processed with a spectral interpolation algorithm. A listening test is performed to evaluate the intended vocal effort in the original databases as well as the interpolated ones. The results show that the interpolation algorithm can create the intended intermediate levels of vocal effort given the original databases independent of the language background of the subjects.

turk_etal2005.pdf (pdf, 912 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence