DFKI-LT - Agile MaryTTS Architecture for the Blizzard Challenge 2018

Sébastien Le Maguer, Ingmar Steiner, Francesco Tombini, Pradipta Deb, Moitree Basu, Insa Kröger
4 Blizzard Challenge, Hyderabad, India, ISCA, ISCA, SynSIG, 9/2018
In this paper, we present the MaryTTS entry for the Blizzard Challenge 2018. Our participation is motivated by the use of a new system architecture whose development began three years ago. To this end, we designed a fully modular pipeline which incorporates native modules and distributed processes, including a new grapheme to phoneme conversion (G2P) component. The back-end also supports this modularity, as the fundamental frequency (F0) is predicted separately, based on a model of its dynamics. A segmental synthesizer using phonetic information and the predicted prosody is then used to produce the final signal. Even though our results are disappointing, the participation has shown that our architecture is functional and that we can now further develop interfaces to several open-source back-ends. This will hopefully strengthen the role of MaryTTS as a framework for research in speech synthesis.
