[mary-users] PRAAT_TEXTGRID - first test and problem with times
Ingmar Steiner
ingmar.steiner at dfki.de
Tue Sep 7 13:07:37 CEST 2010
Dear Brigitte,
> I put the glottal stop in to emphasize that very specific kangaroo.
I'm sorry, this is very confusing. The canonical pronunciation for the German word "Känguru" should not contain a glottal stop. In fact, glottal stops have a very specific distribution in German phonotactics, and do not occur clustered with other stops in the same syllable. If you want emphasis, injecting phones into a token's pronunciation is certainly an aberration, and seems to be the source of your troubles. I strongly suggest you remove it. Instead you should rely on any of several available mechanisms for prosody control, such as boundaries, duration, and pitch specifications.
> Your last question sounds as if you wanted to know how the XML file was written. Very simple answer again: using SAMPA German and oxygen or TextWrangler.
No, I wanted to know if perhaps the phone sequence had been generated by a modified lexicon or faulty LTS rules. But you've already answered that.
Best wishes,
/**
* Ingmar Steiner
* Researcher, Language Technology
* German Research Center for Artificial Intelligence
*
* Campus D3 1 +1.18
* D-66123 Saarbrücken
* Germany
* Phone: ++49-681-857-75-5263 (NEW!)
* Email: ingmar.steiner at dfki.de
*
* Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
* Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
* Geschäftsführung:
* Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
* Dr. Walter Olthoff
* Vorsitzender des Aufsichtsrats:
* Prof. Dr. h.c. Hans A. Aukes
* Amtsgericht Kaiserslautern, HRB 2313
*/
>
> Cheers again!
> Brigitte
>
>
> Am 07.09.2010 um 12:04 schrieb Ingmar Steiner:
>
>> Dear Brigitte,
>>
>>>> To me it looks like the glottal stop is realized as laryngealized, not voiceless; glottal pulses are clearly visible. But that's just one realization in the voice data. What I'd like to know is how the glottal stop got between the words "vom" and "Känguru" in the first place. Did you manually insert it?
>>>
>>> first I send TakeKaenguru.xml, and you will know how the glottal stop came in.
>>
>> Sorry, maybe my question was ambiguous. I'm aware that you did not simply insert a "?" interval into the TextGrid, but that it was synthesized from a requested phone sequence. This sequence includes the value of the "ph" attribute in the
>>
>>> <t ph = "'?kEN-gU-Ru:" >Känguru</t>
>>
>>
>> snippet of the XML file you attached. My question is, why is there a glottal stop in the phones for the token "Känguru"? How was this XML file created?
>>
>> Best wishes,
>>
>> /**
>> * Ingmar Steiner
>> * Researcher, Language Technology
>> * German Research Center for Artificial Intelligence
>> *
>> * Campus D3 1 +1.18
>> * D-66123 Saarbrücken
>> * Germany
>> * Phone: ++49-681-857-75-5263 (NEW!)
>> * Email: ingmar.steiner at dfki.de
>> *
>> * Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>> * Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
>> * Geschäftsführung:
>> * Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> * Dr. Walter Olthoff
>> * Vorsitzender des Aufsichtsrats:
>> * Prof. Dr. h.c. Hans A. Aukes
>> * Amtsgericht Kaiserslautern, HRB 2313
>> */
>>
>>> If you listen to the individualized de7 glottal stop in the sample, using the text grid, you will doubt as well that it is annotated alright.
>>>
>>>>
>>>>> According to me, it comes nearer to reality. The sample is again the kangaroo phrase. I just send the png, the glottal stop is highlighted. This will suffice to convince you.
>>>>
>>>> Sorry, I've lost track of what the issue is. Are you asking about the Praat TextGrid export, or the duration prediction for glottal stops, or the labeling of individual units in the pavoque voice data?
>>> My issue is a clear annotation of sound, phones and larger units. The intervals should correspond to sound reality.
>>>
>>>>
>>>>> So my question adapts after your first answer: are there new female German voices on in Mary TTS, with a better transparency?
>>>>
>>>> AFAIA, a female German HMM voice is in preparation. In the meantime, I would like to encourage you to try the (male) bits3-hsmm voice, as the quality should be superior to diphone synthesis.
>>>
>>> Good that a new female German voice is on. As male intonation differs quite a lot from female one (sorry for the truism!), unfortunately male bits3-hsmm is only of limited help.
>>>
>>> Cheers
>>>
>>> Brigitte
>>> <TakeKaenguru.xml>
>>>
>>>>
>>>> Best wishes,
>>>>
>>>> /**
>>>> * Ingmar Steiner
>>>> * Researcher, Language Technology
>>>> * German Research Center for Artificial Intelligence
>>>> *
>>>> * Campus D3 1 +1.18
>>>> * D-66123 Saarbrücken
>>>> * Germany
>>>> * Phone: ++49-681-857-75-5263 (NEW!)
>>>> * Email: ingmar.steiner at dfki.de
>>>> *
>>>> * Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>>>> * Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
>>>> * Geschäftsführung:
>>>> * Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>> * Dr. Walter Olthoff
>>>> * Vorsitzender des Aufsichtsrats:
>>>> * Prof. Dr. h.c. Hans A. Aukes
>>>> * Amtsgericht Kaiserslautern, HRB 2313
>>>> */
>>>>
>>>>>
>>>>> Cheers from Hannover
>>>>>
>>>>> Brigitte
>>>>>
>>>>> <kaengurupavoque.png>
>>>>>
>>>>>
>>>>>
>>>>> Am 06.09.2010 um 10:08 schrieb Ingmar Steiner:
>>>>>
>>>>>> Dear Brigitte,
>>>>>>
>>>>>> the PRAAT_TEXTGRID output type is essentially a conversion from REALISED_ACOUSTPARAMS to a format convenient for import into Praat. Specifically, the duration information present in MaryXML is formatted as a TextGrid with one or more IntervalTiers. As mentioned in previous messages, the Praat TextGrid support should still be considered experimental.
>>>>>>
>>>>>> You do not mention which voice you (or the anonymous "curious user") used, but your example sounds very much like the de7 female MBROLA voice. MBROLA is a diphone synthesizer, and furthermore does not permit close inspection of its internal processing. The MaryXML is converted to MBROLA format, and passed to the MBROLA binary, which uses the requested voice to generate AUDIO. It is not unlikely that the durations specified in MaryXML (which form the basis of the Praat TextGrid format, as explained above) do not match the phone boundaries in the waveform generated by MBROLA. If you discover that there are systematic mismatches reproducible under certain conditions, it may be a problem with the MBROLA voice data, or possibly a bug in the PraatTextGridGenerator code. Please provide all of the details to me once you have determined that the problem is indeed not with the voice.
>>>>>>
>>>>>> The second and third tiers in the three-tier TextGrid format about which you inquire contain information particular to unit-selection synthesis, viz. the diphone unit boundaries, and the intervals of consecutive units from the same source recording, respectively. These tiers are useful for the analysis of unit-selection itself and debugging. They are only generated by the UnitSelectionSynthesizer, i.e. when using a unit-selection voice.
>>>>>>
>>>>>> Best wishes,
>>>>>>
>>>>>> /**
>>>>>> * Ingmar Steiner
>>>>>> * Researcher, Language Technology
>>>>>> * German Research Center for Artificial Intelligence
>>>>>> *
>>>>>> * Campus D3 1 +1.18
>>>>>> * D-66123 Saarbrücken
>>>>>> * Germany
>>>>>> * Phone: ++49-681-857-75-5263 (NEW!)
>>>>>> * Email: ingmar.steiner at dfki.de
>>>>>> *
>>>>>> * Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>>>>>> * Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
>>>>>> * Geschäftsführung:
>>>>>> * Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>>>> * Dr. Walter Olthoff
>>>>>> * Vorsitzender des Aufsichtsrats:
>>>>>> * Prof. Dr. h.c. Hans A. Aukes
>>>>>> * Amtsgericht Kaiserslautern, HRB 2313
>>>>>> */
>>>>>>
>>>>>> On 5 Sep 2010, at 07:53, Brigitte Endres-Niggemeyer wrote:
>>>>>>
>>>>>>> Dear all and dear Ingmar,
>>>>>>>
>>>>>>> the very curious user of the text grid tested the new Mary TTS version, specifically the PRAAT_TEXTGRID. And she found that phone limits may be misplaced.
>>>>>>> In my very simple test phrase, the glottal stop is too early. I send the png, the text grid and the sound file, hoping that this is enough for a check.
>>>>>>> My obvious question is how the start of the exemplary glottal stop can be adjusted. Doing this by hand editing is not the final option!
>>>>>>>
>>>>>>> Next question: The website Mary demo now produces a three-layer grid with phones, units, and sources. Great, for what applications do you propose this version? Please explain this to me and to others!
>>>>>>>
>>>>>>> Cheers from Hannover
>>>>>>>
>>>>>>> Brigitte
>>>>>>>
>>>>>>>
>>>>>>> <Kaenguru.zip>
>>>>>>>
>>>>
>>>
>>> x Brigitte Endres-Niggemeyer, Prof. Dr. phil. habil.
>>> x FH Hannover xx xx x
>>> x Fakultaet III - Medien, Information und Design xx xxx xx
>>> x Expo Plaza 12 xxxx xxxx xxx xx xx
>>> x 30539 Hannover xx xx xx
>>> x xxx x xxxx x x
>>> x xx xx xx x xxx xxx
>>> x xx xxxxx xxxx xx xx x
>>> x Tel. +49 511 92 96 2641 xxxxx xxx xxxxxxxxxx
>>> x zuHause +49 511 84 41 690 xxxxxx xxx xxx xxx xx xx
>>> x mobil 015154726114 xxx xx xxx xxx xxx
>>> x xx xxxx xx xx xx xx x xx xxx
>>> x xx xx xxx xx xx x
>>> x xxxx xxxx xxxxx xxx xxxx xxxxx xxxxxxxx
>>> x x xxxxxxx x xxxxxxxxxxxxxxxxxxx
>>> x xxx xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxx
>>> x xxxxxxxx xxxxx xxxxx xxxxxxxxxxxxxxxxxxxx
>>> x xxxxxxxxx xxxxx xxxx xxxx xxxxxxxxxxxxxxxxxxx
>>> x xxx xxxx xxxx xxxxxxxxxxxxx xxxx xxxxxxxxxxxxx
>>> x xxxx x xxxx xxxxxxxxxxxxxxxx xxx xxxxxxxxxxx
>>> x xxxx xxxx xxxxxxxxxxxxxxxxxx xxx xxxxxxxxx
>>> x x x xxxx xxxxxxxxxxxxxxxxxxx xxxx xxxxxxx
>>> x xx xxxx xxxxxxxxxxxxxxxxxx xxxxxxxx
>>> x xxx xxxx xxxxxxxxxxxxxxxx xxxxxx
>>> x xxxx xx xxxxxxxxxxxxxxxxx xxxx
>>> x xxx xxxxxx xxxxxxxxxx
>>> x xxx xxx
>>> x xxxxx "spiritus flat ubi vult"
>>> x xx Der Geist weht, wo er will.
>>> x x
>>> x Brigitte.Endres-Niggemeyer at fh-hannover.de
>>> x brigitteen at googlemail.com
>>> x http://endres-niggemeyer.fh-hannover.de/
>>> x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
More information about the Mary-users
mailing list