[mary-users] Mary-users Digest, Vol 48, Issue 20

Fri Jun 25 09:23:14 CEST 2010

Dear Thorsten,

On 24 Jun 2010, at 21:56, Thorsten Westermann wrote:

> Dear Ingmar,
> 
> with "good prompts" you mean that the sentence should cover all relevant morphemes, don't you? 

No, diphones. Morphology doesn't really come into it.

> 
> This is still mystical to me:
> 
> Let's take the word "Schnitzel". 
> 
> Our sentences would have to have "schnit" as the first word, in the middle (to be able to cover this sound as a middle-F0 sound) and at the end of a sentence (to be able to produce this sound with a low F0 sound), right? 
> 
> Or in other words:
> The corpus would have to include 
> 
> 1) schnit.
> 2) schnit?
> 3) schnit...
> 
> The amount of needed sentences appears astronomical to me. 
> Even if we would use pitch manipulation to do the F0 thing correctly, we would have to include (just an example)
> 
> scha
> schb
> schd
> sche
> schf
> schg
> schi
> schj
> (...)
> schni
> schnit
> schnet
> schnat
> schnut
> schl
> 
> Plus all these short words that cannot be syllabified/ morphemified like "schlecht", "gut", "schnell", "schlicht"... They would all have to be in the corpus.
> 
> Am I wrong? I don't know the sentences in the bits 3 corpus, but I cannot imagine that they really covered all these morphemes because it would be such a huge amount of sentences...

You've identified the problem of data sparsity. This is why we use a target cost function with many different weighted discrete feature components, and Classification and Regression Trees to predict continuous target feature values.

> And thanks for your patience with all my question.
> Speech synthesis always fascinated me, and Mary is really interesting.

I warmly recommend reading an introductory textbook, which will provide all of the background, e.g.

@book{Dutoit1997ITTS,
  author = {Thierry Dutoit},
  title = {An Introduction to Text-To-Speech Synthesis},
  publisher = {Springer},
  year = {1997},
  isbn = {978-0-792-34498-8}
}

or the more recent

@book{Taylor2009TTS,
  author = {Paul Taylor},
  title = {Text-to-Speech Synthesis},
  publisher = {Cambridge University Press},
  year = {2009},
  isbn = {978-0-521-89927-7}
}

> 
> MfG,
> Westermann
> -- 
> GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users

Best wishes,

/**
 * Ingmar Steiner
 * Researcher, Language Technology
 * German Research Center for Artificial Intelligence
 *
 * Campus D3 1 +1.18
 * D-66123 Saarbrücken
 * Germany
 * Phone: ++49-681-857-75-5263 (NEW!)
 * Email: ingmar.steiner at dfki.de
 *
 * Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
 * Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
 * Geschäftsführung:
 * Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
 * Dr. Walter Olthoff
 * Vorsitzender des Aufsichtsrats:
 * Prof. Dr. h.c. Hans A. Aukes
 * Amtsgericht Kaiserslautern, HRB 2313
 */