[mary-dev] AcousticModeller

Ingmar Steiner ingmar.steiner at inria.fr
Tue Jul 26 10:57:36 CEST 2011

Dear Florent,

Have a look at the HMMModel in modules.acoustic, but if you want to 
understand its inner workings, you'll eventually have to dive into the 
htsengine package. From what I understand, that in turn goes back to the 
original HTS engine (pre-API for Mary 4.x), which Marcela ported. Sorry, 
but I'm a little out of my depth there.

But if you'd rather focus on the deprecated HMMDurationF0Modeller, be my 
guest... =)

Best wishes,


On 25.07.2011 18:08, fxavier at ircam.fr wrote:
> Dear Ingmar,
> What could be really intersting, as I'm trying to build an HMM voice, is
> to know how the deprecated module HMMDurationF0Modeller works, this module
> is if I understand "included" into the new AcousticModeller. I saw that
> Marcela coded it. Any informations about how the f0 and the durations are
> processed using HMMs would be helpful (this is for my master thesis, I'm
> explaining the generic modules, like AcousticModeller).
> Thanks a lot,
> Florent
>> Dear Florent,
>> On 25.07.2011 13:35, fxavier at ircam.fr wrote:
>>> Dear Ingmar,
>>> I can see you are the one that coded this module. I know that it
>>> preditcs
>>> the fo and durations using CART, according to the symbolic prosody
>>> defined
>>> by ProsodyGeneric and PronunciationModel.
>> Yes, if you're referring to things like syllable stress, ToBI accents,
>> and boundary placement.
>>> Can you tell me more about it? What's the theory behind it, i.e. how did
>>> it work?
>> I hope you won't be disappointed to learn that the AcousticModeller is
>> first and foremost a unified replacement for a handful of deprecated
>> modules (viz. DummyAllophones2AcoustParams, CARTDurationModeller,
>> CARTF0Modeller, HMMDurationF0Modeller) which essentially duplicated very
>> similar high-level processing with very different low-level code.
>> The AcoustingModeller has one purpose, which is to take an utterance and
>> generate target values for acoustic parameters such as duration and F0
>> for all segments and boundaries. How it does this depends on the voice
>> configuration. A unit-selection voice will typically predict those
>> continuous features using CARTs, while a HMM-based voice will use HMMs.
>> The AcousticModeller introduced three major improvements over the old
>> design:
>> 1) A *unified interface*: the AcousticModeller applies Models from the
>> modules.acoustic package, which can be thought of as wrappers around the
>> specific algorithms used for parameter prediction. At the moment, we
>> have CARTs, HMMs, and SoP-based models.
>> 2) By the same token, different *types of models can be mixed* within
>> the same voice; i.e. nothing prevents you from using a CART for duration
>> and HMMs for F0-prediction in a voice if you want.
>> 3) The prosodic parameters are *extensible* by adding custom generic
>> continuous ones which might be relevant for your application. For
>> example, we experimented with voice quality based parameters for
>> expressive unit selection using this feature, but it could be anything,
>> even motion trajectories for audiovisual synthesis!
>> Having said all that, the AcousticModeller itself is implemented in a
>> somewhat baroque way, because the newfound flexibility and elegance in
>> design is tethered by the constraint of backward compatibility within
>> the 4.x generation; there are several hard-coded assumptions and a few
>> hacks in the code. But unless you devise a completely different way of
>> transporting predicted prosody to a given waveform synthesizer, it
>> should serve its purpose reasonably well for now.
>>> Regarding this paper:
>>>    "Three Method of Intonation Modelling"
>>> http://www.cs.cmu.edu/~awb/papers/ESCA98_3int.pdf
>>> does the AcousticModeller use the first method?
>> That depends entirely on the type of Models configured for the voice in
>> question. If you have a conventional unit-selection voice with CARTs for
>> duration and F0 prediction, in Mary, these parameters are predicted in
>> absolute values. You could implement something like Tilt if you wanted,
>> but it makes no difference for the AcousticModeller, which simply
>> assigns and passes out the attribute values.
>>> I can see there
>>> http://mary.dfki.de/documentation/publications/schroeder_trouvain2003.pdf
>>> page 19, that the output of this module is not a MaryXML, but
>>> is a MaryXML, isn' it?
>> I'm not sure what you're referring to here. The IJST article predates by
>> a number of years the architecture for which we designed the
>> AcousticModeller. But you're correct in that the output of the
>> AcousticModeller is MaryXML at the ACOUSTPARAMS stage.
>>> Finally will you put this module into the NLP components, or into the
>>> Synthesis ones? To my mind, it is more a synthesis component rather than
>>> a
>>> NLP one.
>> Sorry, I don't understand the question. The AcousticModeller is in
>> place, and occupies a crucial point in the synthesis pipeline. As it
>> handles the prediction of acoustic parameters, it is certainly beyond
>> the scope of what most people would refer to as NLP.
>> In case you're wondering which artifact in mavenized Mary contains the
>> AcousticModeller and the Models, they're in marytts-common.
>> Best wishes,
>> -Ingmar
>>> Thanks in advance for your answers,
>>> Florent
>> --
>> Ingmar Steiner
>> Postdoctoral Researcher
>> LORIA Speech Group, Nancy, France
>> National Institute for Research in
>> Computer Science and Control (INRIA)
> _______________________________________________
> Mary-dev mailing list
> Mary-dev at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev

Ingmar Steiner
Postdoctoral Researcher

LORIA Speech Group, Nancy, France
National Institute for Research in
Computer Science and Control (INRIA)

More information about the Mary-dev mailing list