[mary-dev] AcousticModeller
Ingmar Steiner
ingmar.steiner at inria.fr
Mon Jul 25 15:48:34 CEST 2011
Dear Florent,
On 25.07.2011 13:35, fxavier at ircam.fr wrote:
> Dear Ingmar,
>
> I can see you are the one that coded this module. I know that it preditcs
> the fo and durations using CART, according to the symbolic prosody defined
> by ProsodyGeneric and PronunciationModel.
Yes, if you're referring to things like syllable stress, ToBI accents,
and boundary placement.
>
> Can you tell me more about it? What's the theory behind it, i.e. how did
> it work?
I hope you won't be disappointed to learn that the AcousticModeller is
first and foremost a unified replacement for a handful of deprecated
modules (viz. DummyAllophones2AcoustParams, CARTDurationModeller,
CARTF0Modeller, HMMDurationF0Modeller) which essentially duplicated very
similar high-level processing with very different low-level code.
The AcoustingModeller has one purpose, which is to take an utterance and
generate target values for acoustic parameters such as duration and F0
for all segments and boundaries. How it does this depends on the voice
configuration. A unit-selection voice will typically predict those
continuous features using CARTs, while a HMM-based voice will use HMMs.
The AcousticModeller introduced three major improvements over the old
design:
1) A *unified interface*: the AcousticModeller applies Models from the
modules.acoustic package, which can be thought of as wrappers around the
specific algorithms used for parameter prediction. At the moment, we
have CARTs, HMMs, and SoP-based models.
2) By the same token, different *types of models can be mixed* within
the same voice; i.e. nothing prevents you from using a CART for duration
and HMMs for F0-prediction in a voice if you want.
3) The prosodic parameters are *extensible* by adding custom generic
continuous ones which might be relevant for your application. For
example, we experimented with voice quality based parameters for
expressive unit selection using this feature, but it could be anything,
even motion trajectories for audiovisual synthesis!
Having said all that, the AcousticModeller itself is implemented in a
somewhat baroque way, because the newfound flexibility and elegance in
design is tethered by the constraint of backward compatibility within
the 4.x generation; there are several hard-coded assumptions and a few
hacks in the code. But unless you devise a completely different way of
transporting predicted prosody to a given waveform synthesizer, it
should serve its purpose reasonably well for now.
> Regarding this paper:
>
> "Three Method of Intonation Modelling"
> http://www.cs.cmu.edu/~awb/papers/ESCA98_3int.pdf
>
> does the AcousticModeller use the first method?
That depends entirely on the type of Models configured for the voice in
question. If you have a conventional unit-selection voice with CARTs for
duration and F0 prediction, in Mary, these parameters are predicted in
absolute values. You could implement something like Tilt if you wanted,
but it makes no difference for the AcousticModeller, which simply
assigns and passes out the attribute values.
>
> I can see there
> http://mary.dfki.de/documentation/publications/schroeder_trouvain2003.pdf
> page 19, that the output of this module is not a MaryXML, but ACOUSTPARAMS
> is a MaryXML, isn' it?
I'm not sure what you're referring to here. The IJST article predates by
a number of years the architecture for which we designed the
AcousticModeller. But you're correct in that the output of the
AcousticModeller is MaryXML at the ACOUSTPARAMS stage.
>
> Finally will you put this module into the NLP components, or into the
> Synthesis ones? To my mind, it is more a synthesis component rather than a
> NLP one.
Sorry, I don't understand the question. The AcousticModeller is in
place, and occupies a crucial point in the synthesis pipeline. As it
handles the prediction of acoustic parameters, it is certainly beyond
the scope of what most people would refer to as NLP.
In case you're wondering which artifact in mavenized Mary contains the
AcousticModeller and the Models, they're in marytts-common.
Best wishes,
-Ingmar
>
> Thanks in advance for your answers,
>
>
> Florent
--
Ingmar Steiner
Postdoctoral Researcher
LORIA Speech Group, Nancy, France
National Institute for Research in
Computer Science and Control (INRIA)
More information about the Mary-dev
mailing list