[mary-users] Mary me

Ingmar Steiner ingmar.steiner at inria.fr
Wed Apr 11 15:18:42 CEST 2012


Dear Klemens,

On 03.04.2012 15:40, Klemens Bobenhausen wrote:
> Dear Ingmar,
>
> thanks a lot for your quick response. No - we did not fail, in general
> thinks work fine, but we have some smaller Problems:
>
> a) Secondary stress^^ ok, I will follow your hints and links first and
> will come back to this point if I won't understand them. The lexicon
> part of the problem is not that big, because we should be able to detect
> and differ both kinds of stress automatically.

Good luck with that!

>
> b) Are there possibilities to define something like "PAUSE" in maryXML?

Taking your question literally: yes, absolutely. By inserting a 
<boundary duration="500" /> into your MaryXML, you request a pause of 
500 ms duration.

However, it depends on the respective voice how this is realized, if at 
all. Older voices may use the CartDurationModeller module, which mostly 
works as expected. Newer voices will use the more flexible 
AcousticModeller module, which allows the voice to specify arbitrary 
models for the prediction of acoustic features such as F0, duration, and 
boundaries.

For newer unit selection voices, this also works as expected, but for 
HSMM voices, the requested boundary duration seems to be ignored. There 
are related open issues

https://github.com/marytts/marytts/issues/5

and

https://github.com/marytts/marytts/issues/7

and it is not unlikely that the root of the problem is the same.

>
> c) In a word like "abfallen" mary does not correctly pronounce the
> stress on the first syllable, for me it seems, that mary is trying to
> detect the root syllable and put stress on that syllable. If I mark the
> first syllable with stress (' ? ap), mary still tries to hold the stress
> on the root syllable. Is this also part of the secondary-stress-problem
> or are there possibilities to stop this?

First of all, the problem is not at the symbolic level. Mary correctly 
marks the first syllable as stressed, as you can see when you select 
PHONEMES or ALLOPHONES as the output type.

That said, think about why you perceive the pronunciation as 
"incorrect". Stress itself is an abstract phonological feature, and you 
are expecting the *acoustic correlates* to be realized in a different 
way than what you are getting from the synthesis. Perhaps it is only the 
F0 contour which is higher on the second (unstressed) syllable than on 
the first, imbuing it with greater perceptual prominence. This is 
probably the cause of the issue, and it depends on how the prosodic 
models for the voice in question were trained. Apparently they make 
inappropriate predictions for this target utterance, while other voices 
(whose models were trained on different data) may pronounce it as expected.

Therefore, are you sure that this issue systematically occurs with all 
German voices? My impression is that only e.g. bits1-hsmm exhibits this 
behavior.

In any case, you can either specify prosody directly in MaryXML to 
override the predicted prosody, or you might experiment with hacking the 
voice config to use the prosody model(s) from another voice instead.

Best wishes,

-Ingmar

>
> Thank you for your great work and help
>
> Klemens
>
>
>
> Am 02.04.2012 09:47, schrieb Ingmar Steiner:
>> Dear Klemens,
>>
>> Assuming your PHONEMES input is valid (i.e., doesn't contain any
>> unknown allophones), there should be no problem in running the
>> synthesis. (If you do this on a local, multi-processor machine, you
>> should be able to synthesize in parallel, saving some time.)
>>
>>  From your message, I'm not sure what your question is; have you tried
>> and failed? If so, which step seems to be the problem?
>>
>> Best wishes,
>>
>> -Ingmar
>>
>> P.S. Note that things will be somewhat more complicated if you plan to
>> support secondary stress; see for instance this thread:
>>
>> http://www.dfki.de/pipermail/mary-users/2012-February/001106.html
>>
>> On 01.04.2012 07:45, Klemens Bobenhausen wrote:
>>> Hello,
>>>
>>> we want to use Mary to sythesize poems that we algoritmically produced
>>> before. These poems have a metrical pattern. What we try to do is to
>>> import our own PHONEMES into maryXML format and switch the stress on
>>> those positions of the poem were the "prosody of normal dicourse" does
>>> not fit with the metrical pattern. The PHONEMES themself are finished.
>>> Our Lexikon has about 300.000 words, is syllable segmentated, knows the
>>> right stress positions and is tranformed into marySAMPA. We now the
>>> metrical stress pattern of the poem and the positions of the text where
>>> the poem generates metrical complexity (between the stress pattern of
>>> the poem and the stress of normal discourse).
>>>
>>> With these datas we want to convert PHONEMES to AUDIO.
>>>
>>> Is there anyone out there who could help us a little with that step?
>>>
>>> Best and thanks in advance
>>>
>>> Klemens
>>>
>>> -
>>
>
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users

-- 
Ingmar Steiner
Postdoctoral Researcher

LORIA Speech Group, Nancy, France
National Institute for Research in
Computer Science and Control (INRIA)


More information about the Mary-users mailing list