[mary-users] Markup language for MARY TTS
Ingmar Steiner
ingmar.steiner at inria.fr
Wed Feb 15 09:49:19 CET 2012
Dear Nikolai,
On 14.02.2012 19:04, Nikolai Kouznetsov wrote:
> Hello,
>
> 1. What is the markup language that I should use to control the output
> of the MARY TTS?
MaryXML, see http://mary.dfki.de/documentation/maryxml
> 2. Where can I find the API I can use to speak with the server from
> within my own application, not from within the provided client application.
http://mary.dfki.de/javadoc/4.3.0/
More specifically, see e.g.
http://www.dfki.de/pipermail/mary-dev/2011-November/000247.html
But you can really use whatever you want to send the HTTP request, see e.g.
http://mary.opendfki.de/browser/trunk/marytts-assembly/src/release/doc/examples/client
or
http://www.dfki.de/pipermail/mary-users/2010-November/000666.html
> 3. Is this API aware of a markup?
See above?
>
>
> Just out of curiosity, how large is the speech material for one voice
> delivering sound of good quality? Please, in terms of minutes/hours and
> not in terms of MB.
For unit selection voices, the audio is essentially uncompressed PCM,
usually sampled at 16 kHz, 16 bit, stored in timeline_waveforms.mry. So
if that file is, say, 110 MB in size, you can (very roughly) estimate
that it contains about one hour of audio:
110 mebioctets ≈ 112500 kibioctets = 115200000 octets = 57600000 samples
= 3600 seconds = 60 minutes = 1 hour
The timeline file also contains an index, so by this reckoning you'll be
a little over the actual audio duration, but it's a start.
For the HSMM voices, it all works quite differently, since the voice
does not contain any audio, only models trained on a certain amount of
audio. There's really no way of knowing how much data was used for
training, but for those voices that have both a unit selection and an
HSMM variant (e.g. "dfki-poppy" and "dfki-poppy-hsmm"), the HSMM voice
would have been trained on the data that is included in the unit
selection voice.
Best wishes,
-Ingmar
>
> Best regards,
> Nikolai
>
>
--
Ingmar Steiner
Postdoctoral Researcher
LORIA Speech Group, Nancy, France
National Institute for Research in
Computer Science and Control (INRIA)
More information about the Mary-users
mailing list