[mary-users] Adding other languages to MARY TTS
Marc Schröder
schroed at dfki.de
Mon Feb 20 08:41:44 CET 2006
Hi Noam,
(I hope you don't mind if I copy this reply to the mary-users mailing list.)
Noam Amir schrieb:
> tell me - how difficult is it to put together the database needed in order
> to synthesize Hebrew? is Mary modular in that sense?
Mary is very modular, and a number of modules exist in a
language-independent and configurable implementation, but there is still
enough work left to do.
For Hebrew, and many other languages, you could start with the existing
MBROLA diphone voices:
http://tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html
You would then need at least the following MARY TTS modules:
* needed: a Tokeniser, cutting the input into sentences and tokens (it
may be possible to re-use de.dfki.lt.mary.modules.JTokeniser for a
number of languages -- whether it would work for Hebrew would need to be
seen)
* optional: a text normalisation which expands numbers, abbreviations
etc. into a pronounceable form (but that can be left out at the beginning)
* optional: a part-of-speech tagger, distinguishing at least between
content words and function words
* crucially needed: a phonemiser, converting the input text into sound
symbols, e.g. in SAMPA. This can be based on rules for some languages
(probably, Spanish), but a pronounciation lexicon is required for others
when the link between spelling and pronounciation is less regular. Then,
also, the lexicon must be complemented with "letter-to-sound" rules for
unknown words.
* optional: a prosody assignment module, predicting e.g. ToBI labels
based on part-of-speech and other information.
de.dfki.lt.mary.modules.ProsodyGeneric, written by my student Stephanie
Becker, may be a good place to start.
* needed: a duration assignment module, predicting phone durations. As a
very first start, the Klatt rules as currently used in the Tibetan
language component: de.dfki.lt.mary.modules.tib.KlattDurationModeller
could be used, of course adapted to the language-specific phoneme set.
* optional: an intonation contour realisation module. For example, there
is a generic de.dfki.lt.mary.modules.TobiContourGenerator that can be
used for different languages by writing appropriate config files.
* needed: synthesis, e.g. using MBROLA voices.
So, in summary, for adding a new language, you most crucially need a
phonemiser, and you need to get at least a tokeniser and a duration
assigner to work. Assuming that there is already an acceptable MBROLA
voice for your language.
On the bright side, as data representation is based on Unicode, there
should be no problem with non-European scripts.
Cheers,
Marc
--
Dr. Marc Schröder, Senior Researcher
DFKI GmbH, Stuhlsatzenhausweg 3, D-66123 Saarbrücken, Germany
http://www.dfki.de/~schroed
Here. Now. Real, first-person experience. Am I there to witness it?
More information about the Mary-users
mailing list