[mary-users] Mary me
Ingmar Steiner
ingmar.steiner at inria.fr
Wed Apr 11 15:18:42 CEST 2012
Dear Klemens,
On 03.04.2012 15:40, Klemens Bobenhausen wrote:
> Dear Ingmar,
>
> thanks a lot for your quick response. No - we did not fail, in general
> thinks work fine, but we have some smaller Problems:
>
> a) Secondary stress^^ ok, I will follow your hints and links first and
> will come back to this point if I won't understand them. The lexicon
> part of the problem is not that big, because we should be able to detect
> and differ both kinds of stress automatically.
Good luck with that!
>
> b) Are there possibilities to define something like "PAUSE" in maryXML?
Taking your question literally: yes, absolutely. By inserting a
<boundary duration="500" /> into your MaryXML, you request a pause of
500 ms duration.
However, it depends on the respective voice how this is realized, if at
all. Older voices may use the CartDurationModeller module, which mostly
works as expected. Newer voices will use the more flexible
AcousticModeller module, which allows the voice to specify arbitrary
models for the prediction of acoustic features such as F0, duration, and
boundaries.
For newer unit selection voices, this also works as expected, but for
HSMM voices, the requested boundary duration seems to be ignored. There
are related open issues
https://github.com/marytts/marytts/issues/5
and
https://github.com/marytts/marytts/issues/7
and it is not unlikely that the root of the problem is the same.
>
> c) In a word like "abfallen" mary does not correctly pronounce the
> stress on the first syllable, for me it seems, that mary is trying to
> detect the root syllable and put stress on that syllable. If I mark the
> first syllable with stress (' ? ap), mary still tries to hold the stress
> on the root syllable. Is this also part of the secondary-stress-problem
> or are there possibilities to stop this?
First of all, the problem is not at the symbolic level. Mary correctly
marks the first syllable as stressed, as you can see when you select
PHONEMES or ALLOPHONES as the output type.
That said, think about why you perceive the pronunciation as
"incorrect". Stress itself is an abstract phonological feature, and you
are expecting the *acoustic correlates* to be realized in a different
way than what you are getting from the synthesis. Perhaps it is only the
F0 contour which is higher on the second (unstressed) syllable than on
the first, imbuing it with greater perceptual prominence. This is
probably the cause of the issue, and it depends on how the prosodic
models for the voice in question were trained. Apparently they make
inappropriate predictions for this target utterance, while other voices
(whose models were trained on different data) may pronounce it as expected.
Therefore, are you sure that this issue systematically occurs with all
German voices? My impression is that only e.g. bits1-hsmm exhibits this
behavior.
In any case, you can either specify prosody directly in MaryXML to
override the predicted prosody, or you might experiment with hacking the
voice config to use the prosody model(s) from another voice instead.
Best wishes,
-Ingmar
>
> Thank you for your great work and help
>
> Klemens
>
>
>
> Am 02.04.2012 09:47, schrieb Ingmar Steiner:
>> Dear Klemens,
>>
>> Assuming your PHONEMES input is valid (i.e., doesn't contain any
>> unknown allophones), there should be no problem in running the
>> synthesis. (If you do this on a local, multi-processor machine, you
>> should be able to synthesize in parallel, saving some time.)
>>
>> From your message, I'm not sure what your question is; have you tried
>> and failed? If so, which step seems to be the problem?
>>
>> Best wishes,
>>
>> -Ingmar
>>
>> P.S. Note that things will be somewhat more complicated if you plan to
>> support secondary stress; see for instance this thread:
>>
>> http://www.dfki.de/pipermail/mary-users/2012-February/001106.html
>>
>> On 01.04.2012 07:45, Klemens Bobenhausen wrote:
>>> Hello,
>>>
>>> we want to use Mary to sythesize poems that we algoritmically produced
>>> before. These poems have a metrical pattern. What we try to do is to
>>> import our own PHONEMES into maryXML format and switch the stress on
>>> those positions of the poem were the "prosody of normal dicourse" does
>>> not fit with the metrical pattern. The PHONEMES themself are finished.
>>> Our Lexikon has about 300.000 words, is syllable segmentated, knows the
>>> right stress positions and is tranformed into marySAMPA. We now the
>>> metrical stress pattern of the poem and the positions of the text where
>>> the poem generates metrical complexity (between the stress pattern of
>>> the poem and the stress of normal discourse).
>>>
>>> With these datas we want to convert PHONEMES to AUDIO.
>>>
>>> Is there anyone out there who could help us a little with that step?
>>>
>>> Best and thanks in advance
>>>
>>> Klemens
>>>
>>> -
>>
>
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
--
Ingmar Steiner
Postdoctoral Researcher
LORIA Speech Group, Nancy, France
National Institute for Research in
Computer Science and Control (INRIA)
More information about the Mary-users
mailing list