[mary-users] Adding other languages to MARY TTS

Mon Feb 20 08:41:44 CET 2006

Hi Noam,

(I hope you don't mind if I copy this reply to the mary-users mailing list.)

Noam Amir schrieb:
> tell me - how difficult is it to put together the database needed in order
> to synthesize Hebrew? is Mary modular in that sense?

Mary is very modular, and a number of modules exist in a 
language-independent and configurable implementation, but there is still 
enough work left to do.

For Hebrew, and many other languages, you could start with the existing 
MBROLA diphone voices: 
http://tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html

You would then need at least the following MARY TTS modules:

* needed: a Tokeniser, cutting the input into sentences and tokens (it 
may be possible to re-use de.dfki.lt.mary.modules.JTokeniser for a 
number of languages -- whether it would work for Hebrew would need to be 
seen)

* optional: a text normalisation which expands numbers, abbreviations 
etc. into a pronounceable form (but that can be left out at the beginning)
* optional: a part-of-speech tagger, distinguishing at least between 
content words and function words

* crucially needed: a phonemiser, converting the input text into sound 
symbols, e.g. in SAMPA. This can be based on rules for some languages 
(probably, Spanish), but a pronounciation lexicon is required for others 
when the link between spelling and pronounciation is less regular. Then, 
also, the lexicon must be complemented with "letter-to-sound" rules for 
unknown words.

* optional: a prosody assignment module, predicting e.g. ToBI labels 
based on part-of-speech and other information. 
de.dfki.lt.mary.modules.ProsodyGeneric, written by my student Stephanie 
Becker, may be a good place to start.

* needed: a duration assignment module, predicting phone durations. As a 
very first start, the Klatt rules as currently used in the Tibetan 
language component: de.dfki.lt.mary.modules.tib.KlattDurationModeller 
could be used, of course adapted to the language-specific phoneme set.

* optional: an intonation contour realisation module. For example, there 
is a generic de.dfki.lt.mary.modules.TobiContourGenerator that can be 
used for different languages by writing appropriate config files.

* needed: synthesis, e.g. using MBROLA voices.

So, in summary, for adding a new language, you most crucially need a 
phonemiser, and you need to get at least a tokeniser and a duration 
assigner to work. Assuming that there is already an acceptable MBROLA 
voice for your language.

On the bright side, as data representation is based on Unicode, there 
should be no problem with non-European scripts.

Cheers,

Marc

-- 
Dr. Marc Schröder, Senior Researcher
DFKI GmbH, Stuhlsatzenhausweg 3, D-66123 Saarbrücken, Germany
http://www.dfki.de/~schroed
Here. Now. Real, first-person experience. Am I there to witness it?