[mary-dev] AcousticModeller

fxavier at ircam.fr fxavier at ircam.fr
Tue Jul 26 15:45:49 CEST 2011


Dear Ingmar,

Thanks a lot for your answers. I actually have a few other questions.

If I understand well, one can use CART for f0 and HMM for durations for
example, independantly of the type of synthesis (HMM or unit-selction). So
we can chose whatever we want from the three algorithm: CART, HMM and SoP.
What do you think is the best choice? Which one of them is the most
succeful? Does it depends on the type of synthesis?

What is SoP exactly? I never heard of it (nothing on the internet)?

I think with those answers I will have enough information on the
AcousticModeller.

Thanks in advance,


Florent



> Dear Florent,
>
> Have a look at the HMMModel in modules.acoustic, but if you want to
> understand its inner workings, you'll eventually have to dive into the
> htsengine package. From what I understand, that in turn goes back to the
> original HTS engine (pre-API for Mary 4.x), which Marcela ported. Sorry,
> but I'm a little out of my depth there.
>
> But if you'd rather focus on the deprecated HMMDurationF0Modeller, be my
> guest... =)
>
> Best wishes,
>
> -Ingmar
>
> On 25.07.2011 18:08, fxavier at ircam.fr wrote:
>> Dear Ingmar,
>>
>> What could be really intersting, as I'm trying to build an HMM voice, is
>> to know how the deprecated module HMMDurationF0Modeller works, this
>> module
>> is if I understand "included" into the new AcousticModeller. I saw that
>> Marcela coded it. Any informations about how the f0 and the durations
>> are
>> processed using HMMs would be helpful (this is for my master thesis, I'm
>> explaining the generic modules, like AcousticModeller).
>> Thanks a lot,
>>
>>
>> Florent
>>
>>
>>
>>
>>
>>
>>> Dear Florent,
>>>
>>> On 25.07.2011 13:35, fxavier at ircam.fr wrote:
>>>> Dear Ingmar,
>>>>
>>>> I can see you are the one that coded this module. I know that it
>>>> preditcs
>>>> the fo and durations using CART, according to the symbolic prosody
>>>> defined
>>>> by ProsodyGeneric and PronunciationModel.
>>>
>>> Yes, if you're referring to things like syllable stress, ToBI accents,
>>> and boundary placement.
>>>
>>>>
>>>> Can you tell me more about it? What's the theory behind it, i.e. how
>>>> did
>>>> it work?
>>>
>>> I hope you won't be disappointed to learn that the AcousticModeller is
>>> first and foremost a unified replacement for a handful of deprecated
>>> modules (viz. DummyAllophones2AcoustParams, CARTDurationModeller,
>>> CARTF0Modeller, HMMDurationF0Modeller) which essentially duplicated
>>> very
>>> similar high-level processing with very different low-level code.
>>>
>>> The AcoustingModeller has one purpose, which is to take an utterance
>>> and
>>> generate target values for acoustic parameters such as duration and F0
>>> for all segments and boundaries. How it does this depends on the voice
>>> configuration. A unit-selection voice will typically predict those
>>> continuous features using CARTs, while a HMM-based voice will use HMMs.
>>>
>>> The AcousticModeller introduced three major improvements over the old
>>> design:
>>>
>>> 1) A *unified interface*: the AcousticModeller applies Models from the
>>> modules.acoustic package, which can be thought of as wrappers around
>>> the
>>> specific algorithms used for parameter prediction. At the moment, we
>>> have CARTs, HMMs, and SoP-based models.
>>>
>>> 2) By the same token, different *types of models can be mixed* within
>>> the same voice; i.e. nothing prevents you from using a CART for
>>> duration
>>> and HMMs for F0-prediction in a voice if you want.
>>>
>>> 3) The prosodic parameters are *extensible* by adding custom generic
>>> continuous ones which might be relevant for your application. For
>>> example, we experimented with voice quality based parameters for
>>> expressive unit selection using this feature, but it could be anything,
>>> even motion trajectories for audiovisual synthesis!
>>>
>>> Having said all that, the AcousticModeller itself is implemented in a
>>> somewhat baroque way, because the newfound flexibility and elegance in
>>> design is tethered by the constraint of backward compatibility within
>>> the 4.x generation; there are several hard-coded assumptions and a few
>>> hacks in the code. But unless you devise a completely different way of
>>> transporting predicted prosody to a given waveform synthesizer, it
>>> should serve its purpose reasonably well for now.
>>>
>>>> Regarding this paper:
>>>>
>>>>    "Three Method of Intonation Modelling"
>>>> http://www.cs.cmu.edu/~awb/papers/ESCA98_3int.pdf
>>>>
>>>> does the AcousticModeller use the first method?
>>>
>>> That depends entirely on the type of Models configured for the voice in
>>> question. If you have a conventional unit-selection voice with CARTs
>>> for
>>> duration and F0 prediction, in Mary, these parameters are predicted in
>>> absolute values. You could implement something like Tilt if you wanted,
>>> but it makes no difference for the AcousticModeller, which simply
>>> assigns and passes out the attribute values.
>>>
>>>>
>>>> I can see there
>>>> http://mary.dfki.de/documentation/publications/schroeder_trouvain2003.pdf
>>>> page 19, that the output of this module is not a MaryXML, but
>>>> ACOUSTPARAMS
>>>> is a MaryXML, isn' it?
>>>
>>> I'm not sure what you're referring to here. The IJST article predates
>>> by
>>> a number of years the architecture for which we designed the
>>> AcousticModeller. But you're correct in that the output of the
>>> AcousticModeller is MaryXML at the ACOUSTPARAMS stage.
>>>
>>>>
>>>> Finally will you put this module into the NLP components, or into the
>>>> Synthesis ones? To my mind, it is more a synthesis component rather
>>>> than
>>>> a
>>>> NLP one.
>>>
>>> Sorry, I don't understand the question. The AcousticModeller is in
>>> place, and occupies a crucial point in the synthesis pipeline. As it
>>> handles the prediction of acoustic parameters, it is certainly beyond
>>> the scope of what most people would refer to as NLP.
>>>
>>> In case you're wondering which artifact in mavenized Mary contains the
>>> AcousticModeller and the Models, they're in marytts-common.
>>>
>>> Best wishes,
>>>
>>> -Ingmar
>>>
>>>>
>>>> Thanks in advance for your answers,
>>>>
>>>>
>>>> Florent
>>>
>>> --
>>> Ingmar Steiner
>>> Postdoctoral Researcher
>>>
>>> LORIA Speech Group, Nancy, France
>>> National Institute for Research in
>>> Computer Science and Control (INRIA)
>>>
>>>
>>
>> _______________________________________________
>> Mary-dev mailing list
>> Mary-dev at dfki.de
>> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>
> --
> Ingmar Steiner
> Postdoctoral Researcher
>
> LORIA Speech Group, Nancy, France
> National Institute for Research in
> Computer Science and Control (INRIA)
>
>



More information about the Mary-dev mailing list