[mary-users] Mary-users Digest, Vol 48, Issue 20
Ingmar Steiner
ingmar.steiner at dfki.de
Fri Jun 25 09:23:14 CEST 2010
Dear Thorsten,
On 24 Jun 2010, at 21:56, Thorsten Westermann wrote:
> Dear Ingmar,
>
> with "good prompts" you mean that the sentence should cover all relevant morphemes, don't you?
No, diphones. Morphology doesn't really come into it.
>
> This is still mystical to me:
>
> Let's take the word "Schnitzel".
>
> Our sentences would have to have "schnit" as the first word, in the middle (to be able to cover this sound as a middle-F0 sound) and at the end of a sentence (to be able to produce this sound with a low F0 sound), right?
>
> Or in other words:
> The corpus would have to include
>
> 1) schnit.
> 2) schnit?
> 3) schnit...
>
> The amount of needed sentences appears astronomical to me.
> Even if we would use pitch manipulation to do the F0 thing correctly, we would have to include (just an example)
>
> scha
> schb
> schd
> sche
> schf
> schg
> schi
> schj
> (...)
> schni
> schnit
> schnet
> schnat
> schnut
> schl
>
> Plus all these short words that cannot be syllabified/ morphemified like "schlecht", "gut", "schnell", "schlicht"... They would all have to be in the corpus.
>
> Am I wrong? I don't know the sentences in the bits 3 corpus, but I cannot imagine that they really covered all these morphemes because it would be such a huge amount of sentences...
You've identified the problem of data sparsity. This is why we use a target cost function with many different weighted discrete feature components, and Classification and Regression Trees to predict continuous target feature values.
> And thanks for your patience with all my question.
> Speech synthesis always fascinated me, and Mary is really interesting.
I warmly recommend reading an introductory textbook, which will provide all of the background, e.g.
@book{Dutoit1997ITTS,
author = {Thierry Dutoit},
title = {An Introduction to Text-To-Speech Synthesis},
publisher = {Springer},
year = {1997},
isbn = {978-0-792-34498-8}
}
or the more recent
@book{Taylor2009TTS,
author = {Paul Taylor},
title = {Text-to-Speech Synthesis},
publisher = {Cambridge University Press},
year = {2009},
isbn = {978-0-521-89927-7}
}
>
> MfG,
> Westermann
> --
> GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
Best wishes,
/**
* Ingmar Steiner
* Researcher, Language Technology
* German Research Center for Artificial Intelligence
*
* Campus D3 1 +1.18
* D-66123 Saarbrücken
* Germany
* Phone: ++49-681-857-75-5263 (NEW!)
* Email: ingmar.steiner at dfki.de
*
* Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
* Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
* Geschäftsführung:
* Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
* Dr. Walter Olthoff
* Vorsitzender des Aufsichtsrats:
* Prof. Dr. h.c. Hans A. Aukes
* Amtsgericht Kaiserslautern, HRB 2313
*/
More information about the Mary-users
mailing list