How to Compare TTS Systems: A New Subjective Evaluation Methodology Focused on Differences

Jonathan Chevelu, Damien Lolive, Sébastien Le Maguer, David Guennec

In: Proceedings of INTERSPEECH. Conference in the Annual Series of Interspeech Events (INTERSPEECH-2015) September 6-10 Dresden Germany ISCA 2015.


Subjective evaluation is a crucial problem in the speech process- ing community and especially for the speech synthesis field, no matter what system is used. Indeed, when trying to assess the effectiveness of a proposed method, researchers usually conduct subjective evaluations by randomly choosing a small set of sam- ples, from the same domain, taken from a baseline system and the proposed one. When selecting them randomly, statistically, samples with almost no differences are evaluated and the global measure is smoothed which may lead to judge the improvement not significant. To solve this methodological flaw, we propose to compare speech synthesis systems on thousands of generated samples from various domains and to focus subjective evaluations on the most relevant ones by computing a normalized alignment cost between sample pairs. This process has been successfully applied both in the HTS statistical framework and in the corpus- based approach. We have conducted two perceptive experi- ments by generating more than 27,000 samples for each system under comparison. A comparison between tests involving most different samples and randomly chosen samples shows clearly that the proposed approach reveals significant differences be- tween the systems.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence