[mary-dev] AcousticModeller

Mon Jul 25 18:08:21 CEST 2011

Dear Ingmar,

What could be really intersting, as I'm trying to build an HMM voice, is
to know how the deprecated module HMMDurationF0Modeller works, this module
is if I understand "included" into the new AcousticModeller. I saw that
Marcela coded it. Any informations about how the f0 and the durations are
processed using HMMs would be helpful (this is for my master thesis, I'm
explaining the generic modules, like AcousticModeller).
Thanks a lot,

Florent

> Dear Florent,
>
> On 25.07.2011 13:35, fxavier at ircam.fr wrote:
>> Dear Ingmar,
>>
>> I can see you are the one that coded this module. I know that it
>> preditcs
>> the fo and durations using CART, according to the symbolic prosody
>> defined
>> by ProsodyGeneric and PronunciationModel.
>
> Yes, if you're referring to things like syllable stress, ToBI accents,
> and boundary placement.
>
>>
>> Can you tell me more about it? What's the theory behind it, i.e. how did
>> it work?
>
> I hope you won't be disappointed to learn that the AcousticModeller is
> first and foremost a unified replacement for a handful of deprecated
> modules (viz. DummyAllophones2AcoustParams, CARTDurationModeller,
> CARTF0Modeller, HMMDurationF0Modeller) which essentially duplicated very
> similar high-level processing with very different low-level code.
>
> The AcoustingModeller has one purpose, which is to take an utterance and
> generate target values for acoustic parameters such as duration and F0
> for all segments and boundaries. How it does this depends on the voice
> configuration. A unit-selection voice will typically predict those
> continuous features using CARTs, while a HMM-based voice will use HMMs.
>
> The AcousticModeller introduced three major improvements over the old
> design:
>
> 1) A *unified interface*: the AcousticModeller applies Models from the
> modules.acoustic package, which can be thought of as wrappers around the
> specific algorithms used for parameter prediction. At the moment, we
> have CARTs, HMMs, and SoP-based models.
>
> 2) By the same token, different *types of models can be mixed* within
> the same voice; i.e. nothing prevents you from using a CART for duration
> and HMMs for F0-prediction in a voice if you want.
>
> 3) The prosodic parameters are *extensible* by adding custom generic
> continuous ones which might be relevant for your application. For
> example, we experimented with voice quality based parameters for
> expressive unit selection using this feature, but it could be anything,
> even motion trajectories for audiovisual synthesis!
>
> Having said all that, the AcousticModeller itself is implemented in a
> somewhat baroque way, because the newfound flexibility and elegance in
> design is tethered by the constraint of backward compatibility within
> the 4.x generation; there are several hard-coded assumptions and a few
> hacks in the code. But unless you devise a completely different way of
> transporting predicted prosody to a given waveform synthesizer, it
> should serve its purpose reasonably well for now.
>
>> Regarding this paper:
>>
>>   "Three Method of Intonation Modelling"
>> http://www.cs.cmu.edu/~awb/papers/ESCA98_3int.pdf
>>
>> does the AcousticModeller use the first method?
>
> That depends entirely on the type of Models configured for the voice in
> question. If you have a conventional unit-selection voice with CARTs for
> duration and F0 prediction, in Mary, these parameters are predicted in
> absolute values. You could implement something like Tilt if you wanted,
> but it makes no difference for the AcousticModeller, which simply
> assigns and passes out the attribute values.
>
>>
>> I can see there
>> http://mary.dfki.de/documentation/publications/schroeder_trouvain2003.pdf
>> page 19, that the output of this module is not a MaryXML, but
>> ACOUSTPARAMS
>> is a MaryXML, isn' it?
>
> I'm not sure what you're referring to here. The IJST article predates by
> a number of years the architecture for which we designed the
> AcousticModeller. But you're correct in that the output of the
> AcousticModeller is MaryXML at the ACOUSTPARAMS stage.
>
>>
>> Finally will you put this module into the NLP components, or into the
>> Synthesis ones? To my mind, it is more a synthesis component rather than
>> a
>> NLP one.
>
> Sorry, I don't understand the question. The AcousticModeller is in
> place, and occupies a crucial point in the synthesis pipeline. As it
> handles the prediction of acoustic parameters, it is certainly beyond
> the scope of what most people would refer to as NLP.
>
> In case you're wondering which artifact in mavenized Mary contains the
> AcousticModeller and the Models, they're in marytts-common.
>
> Best wishes,
>
> -Ingmar
>
>>
>> Thanks in advance for your answers,
>>
>>
>> Florent
>
> --
> Ingmar Steiner
> Postdoctoral Researcher
>
> LORIA Speech Group, Nancy, France
> National Institute for Research in
> Computer Science and Control (INRIA)
>
>