Toward the use of information density based descriptive features in HMM based speech synthesis

Sébastien Le Maguer, Bernd Möbius, Ingmar Steiner

In: 8th International Conference on Speech Prosody. International Conference on Speech Prosody (Speech Prosody) 8th May 31-June 3 Boston MA United States ISCA 2016.


Over the last decades, acoustic modeling for speech synthesis has been improved significantly. However, in most systems, the descriptive feature set used to represent annotated text has been the same for many years. Specifically, the prosody models in most systems are based on low level information such as syllable stress or word part-of-speech tags. In this paper, we propose to enrich the descriptive feature set by adding a linguistic measure computed from the predictability of an event, such as the occurrence of a syllable or word. By adding such descriptive features, we assume that we will improve prosody modeling. This new feature set is then used to train prosody models for speech synthesis. Results from an evaluation study indicate a preference for the new descriptive feature set over the conventional one.


Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence