[mary-users] Mary me

Klemens Bobenhausen bobenhausen at googlemail.com
Wed Apr 11 16:23:18 CEST 2012


Dear Ingmar,

thank you again for your answers. I'm not very familiar with speech 
synthesis and I'ts very helpful to have you at this moment. Give me some 
month and I will enrich all your hints and tips to knowledge! For now I 
have to say that you are right, the problems with "abfallen" depends on 
the voice. I was using bits1-hsmm, the other voices do this better and 
now I slowly understand why. "boundary duration" is easy for us to use, 
thank you for that tip.

About the detection of secondary stress: Please have a short look to 
www.metricalizer.de. Take any word, mabye "abfallen" and paste it into 
the "Metrikanalyse" and click on "Gedicht analysieren" . Wait some 
seconds and click on "Akzentdissimilat." If your example was "abfallen" 
you will see, that syllable 1 has become more "stress" than syllable 2 
or syllable 3. If you take "Kirschbaum" as compound with secondary 
stress on the second syllable "Akzentdissimilat." fails, but you can see 
the primary stress marked with + on the first syllable before clicking 
on "Akzentdissimilat." Try any nonsense word, it's not a lexikon it's an 
algorithm.

All the best

Klemens





Am 11.04.2012 15:18, schrieb Ingmar Steiner:
> Dear Klemens,
>
> On 03.04.2012 15:40, Klemens Bobenhausen wrote:
>> Dear Ingmar,
>>
>> thanks a lot for your quick response. No - we did not fail, in general
>> thinks work fine, but we have some smaller Problems:
>>
>> a) Secondary stress^^ ok, I will follow your hints and links first and
>> will come back to this point if I won't understand them. The lexicon
>> part of the problem is not that big, because we should be able to detect
>> and differ both kinds of stress automatically.
>
> Good luck with that!
>
>>
>> b) Are there possibilities to define something like "PAUSE" in maryXML?
>
> Taking your question literally: yes, absolutely. By inserting a 
> <boundary duration="500" /> into your MaryXML, you request a pause of 
> 500 ms duration.
>
> However, it depends on the respective voice how this is realized, if 
> at all. Older voices may use the CartDurationModeller module, which 
> mostly works as expected. Newer voices will use the more flexible 
> AcousticModeller module, which allows the voice to specify arbitrary 
> models for the prediction of acoustic features such as F0, duration, 
> and boundaries.
>
> For newer unit selection voices, this also works as expected, but for 
> HSMM voices, the requested boundary duration seems to be ignored. 
> There are related open issues
>
> https://github.com/marytts/marytts/issues/5
>
> and
>
> https://github.com/marytts/marytts/issues/7
>
> and it is not unlikely that the root of the problem is the same.
>
>>
>> c) In a word like "abfallen" mary does not correctly pronounce the
>> stress on the first syllable, for me it seems, that mary is trying to
>> detect the root syllable and put stress on that syllable. If I mark the
>> first syllable with stress (' ? ap), mary still tries to hold the stress
>> on the root syllable. Is this also part of the secondary-stress-problem
>> or are there possibilities to stop this?
>
> First of all, the problem is not at the symbolic level. Mary correctly 
> marks the first syllable as stressed, as you can see when you select 
> PHONEMES or ALLOPHONES as the output type.
>
> That said, think about why you perceive the pronunciation as 
> "incorrect". Stress itself is an abstract phonological feature, and 
> you are expecting the *acoustic correlates* to be realized in a 
> different way than what you are getting from the synthesis. Perhaps it 
> is only the F0 contour which is higher on the second (unstressed) 
> syllable than on the first, imbuing it with greater perceptual 
> prominence. This is probably the cause of the issue, and it depends on 
> how the prosodic models for the voice in question were trained. 
> Apparently they make inappropriate predictions for this target 
> utterance, while other voices (whose models were trained on different 
> data) may pronounce it as expected.
>
> Therefore, are you sure that this issue systematically occurs with all 
> German voices? My impression is that only e.g. bits1-hsmm exhibits 
> this behavior.
>
> In any case, you can either specify prosody directly in MaryXML to 
> override the predicted prosody, or you might experiment with hacking 
> the voice config to use the prosody model(s) from another voice instead.
>
> Best wishes,
>
> -Ingmar
>
>>
>> Thank you for your great work and help
>>
>> Klemens
>>
>>
>>
>> Am 02.04.2012 09:47, schrieb Ingmar Steiner:
>>> Dear Klemens,
>>>
>>> Assuming your PHONEMES input is valid (i.e., doesn't contain any
>>> unknown allophones), there should be no problem in running the
>>> synthesis. (If you do this on a local, multi-processor machine, you
>>> should be able to synthesize in parallel, saving some time.)
>>>
>>>  From your message, I'm not sure what your question is; have you tried
>>> and failed? If so, which step seems to be the problem?
>>>
>>> Best wishes,
>>>
>>> -Ingmar
>>>
>>> P.S. Note that things will be somewhat more complicated if you plan to
>>> support secondary stress; see for instance this thread:
>>>
>>> http://www.dfki.de/pipermail/mary-users/2012-February/001106.html
>>>
>>> On 01.04.2012 07:45, Klemens Bobenhausen wrote:
>>>> Hello,
>>>>
>>>> we want to use Mary to sythesize poems that we algoritmically produced
>>>> before. These poems have a metrical pattern. What we try to do is to
>>>> import our own PHONEMES into maryXML format and switch the stress on
>>>> those positions of the poem were the "prosody of normal dicourse" does
>>>> not fit with the metrical pattern. The PHONEMES themself are finished.
>>>> Our Lexikon has about 300.000 words, is syllable segmentated, knows 
>>>> the
>>>> right stress positions and is tranformed into marySAMPA. We now the
>>>> metrical stress pattern of the poem and the positions of the text 
>>>> where
>>>> the poem generates metrical complexity (between the stress pattern of
>>>> the poem and the stress of normal discourse).
>>>>
>>>> With these datas we want to convert PHONEMES to AUDIO.
>>>>
>>>> Is there anyone out there who could help us a little with that step?
>>>>
>>>> Best and thanks in advance
>>>>
>>>> Klemens
>>>>
>>>> -
>>>
>>
>> _______________________________________________
>> Mary-users mailing list
>> Mary-users at dfki.de
>> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>

-- 
Klemens Bobenhausen
Erwinstr. 76
79102 Freiburg
0761-808905
0174-3327784
Fax: 0761/88530910



More information about the Mary-users mailing list