Skip to main content Skip to main navigation

Publikation

A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis

Oytun Türk; Marc Schröder
In: Proc. Interspeech 2008. Conference in the Annual Series of Interspeech Events (INTERSPEECH-2008), 9th, located at SST conference 2008, September 22-26, Brisbane, Queensland, Australia, Pages 2282-2285, International Speech Communication Association ISCA, 2008.

Zusammenfassung

This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus being on spectral envelope conversion for transforming the overall voice quality. Various improvements over the state-of-the-art weighted codebook mapping and GMM based voice conversion frameworks are employed resulting in three algorithms. Objective evaluation results show that all three methods result in comparable reduction in objective distance to target expressive TTS outputs whereas weighted frame mapping and GMM based transformations were perceived slightly better than the weighted codebook mapping outputs in generating the target expressive style in a listening test.

Projekte