[mary-users] Phonemes Times issues with MaryTTS.

Thu Apr 25 21:17:34 CEST 2013

Dear Ricardo,

thanks for such a detailed report! Here's some feedback.

First of all, what you've listed in this message is not ACOUSTPARAMS, 
but REALISED_ACOUSTPARAMS. The former is the processed input text, with 
all of the predicted pronunciation and prosodic parameters (duration, 
f0). Then comes the unit selection process, which finds the optimal 
sequence of units, given that prediction, from the selected voice 
database. The result is REALISED_ACOUSTPARAMS.

In a perfect world, the selected units would match 100% with the 
prediction, but reality of course bites. So the durations of the the 
phones and syllables are adjusted according to the selected units, and 
that's why there are differences. ACOUSTPARAMS will never match up with 
the acoustic output of the unit selection, and the mismatch will 
accumulate with longer input. Use REALISED_ACOUSTPARAMS instead, and you 
will get the actual durations. Incidentally, REALISED_DURATIONS is just 
a shortcut to get those durations in XWaves .lab format.

Intonational boundaries such as those predicted from some punctuation 
tokens currently have a fixed duration of 400 ms.

What confuses me is that you explain the issue you observe in such a way 
that I believe you're requesting ACOUSTPARAMS instead of 
REALISED_ACOUSTPARAMS, but the XML you paste indicates that you are 
requesting with the latter.

I still hope that perhaps this helps answer your questions.

Best wishes,

-Ingmar

On 4/25/13 4:45 PM, Ricardo Duarte wrote:
> Hi all,
>
> I use MaryTTS with a coarticulation solution, which works fine unless i
> have paragraphs in the input sentences.
>
> I am using this input text from another application to maryTTS:
> Achilles picked up his sword and shield, and wore his armour.
> Afterwards, he left his tent, heading for the battlefield. There, he
> attacked Hector and killed him. Achilles picked up Hector’s body,
> returned to the Greek camp and dropped Hector’s body. He picked up
> Patroklus’ body and walked to the beach. There, Achilles prepared for
> Patroclus’ funeral, and then performed it. He returned to his tent,
> where Priam was waiting and, after a discussion, Achilles allowed Priam
> to take Hector’s body.
>
> The input works fine if i remove the " . " (period sign) from the
> sentences but if i include the period signs, a few issues occur.
>
> - My application currently sends two requests to handle such input, an
> accoustparams and an audio request. In the accoustparams which i
> included below as a sample:
>
>                          <t accent="!H*" g2p_method="lexicon" ph="' A -
> r m r=" pos="NN">
>                              armour
>                              <syllable accent="!H*" ph="A" stress="1">
>                                  <ph d="79" end="3.1326256"
> f0="(0,98)(50,102)(100,84)" p="A"
>                                      units="A_L w0241 38263 0.033; A_R
> w0241 38264 0.0464375" />
>                              </syllable>
>                              <syllable ph="r m r=">
>                                  <ph d="23" end="3.1557505" f0="(0,84)"
> p="r"
>                                      units="r_L w0241 38265 0.01225; r_R
> w0354 51308 0.010875" />
>                                  <ph d="23" end="3.178313" p="m"
>                                      units="m_L w0354 51309 0.0111875;
> m_R w0354 51310 0.011375" />
>                                  <ph d="92" end="3.2703755"
> f0="(50,103)(100,103)" p="r="
>                                      units="r=_L w0354 51311 0.0120625;
> r=_R w0471 62522 0.08" />
>                              </syllable>
>                          <t pos=".">
>                          </t>
>                              .
>                          </t>
>                          <boundary breakindex="5" duration="200" tone="L-L%"
>                              units="__L w0471 62523 0.2" />
>                      </phrase>
>                  </prosody>
>              </s>
>              <s>
>                  <prosody pitch="+3%" range="+15%">
>                      <phrase>
>                          <t g2p_method="lexicon" ph="' { f - t r= - w r=
> d z" pos="RB">
>                              Afterwards
>                              <syllable ph="{ f" stress="1">
>                                  <ph d="111" end="0.11131249"
> f0="(0,96)(50,119)(100,112)" p="{"
>                                      units="{_L w0569 71473 0.0515625;
> {_R w0569 71474 0.05975" />
>                                  <ph d="50" end="0.16131249" p="f"
>                                      units="f_L w0569 71475 0.02; f_R
> w0569 71476 0.03" />
>                              </syllable>
>
> Once a new sentence is created maryTTS does not give a pause time
> between the last phoneme of the sentence " r= " the final " <t> . </t> "
> so when I attempt to add the end time of the phoneme " r= " to the
> phoneme " {" it goes outof sync slightly and this sync issue grows with
> amount paragraphs in my input text.
>
>
> - I have had a look at realised_durations to get the end result but
> strangely realised durations gives different phomenes durations and end
> times for the same input text.
>
>
> My questions are;
> - Is there a fixed duration for tokens with " . " or ", " in the xml for
> realised acoust_params if not how can i know how long will the pause be
> between each new <s> tag (sentence)?
>
> - Why is the output from realised_durations different from the xml, is
> this expected? wont it cause to be slightly off-sync when i used the audio?
>
> - which format provides the best sync times with audio,
> realised_acoustparams or realised_durations?
>
> The full output are in this links:
>
> xml: http://ricardoduarte.co.uk/marytts/output/output_with_periods.xml
>
> realised_durations:
> http://ricardoduarte.co.uk/marytts/output/output_realised_durations.txt
>
> Sorry for the long post but this is really bugging me.
>
> Regards
>
> Ricardo
>
>
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>

-- 
/**
  * Dr. Ingmar Steiner
  *
  * Head of Independent Research Group
  * Multimodal Speech Processing
  * Cluster of Excellence MMCI
  *
  * Senior Researcher
  * Language Technology Lab
  * German Research Center for
  * Artificial Intelligence (DFKI GmbH)
  *
  * Adjunct Assistant Professor
  * Department of Computer Science
  * Saarland University
  *
  * Campus C7.4, Room 3.01
  * D-66123 Saarbrücken
  * @tel: +49-681-302-70028
  * @fax: +49-681-302-4317
  * @web: http://coli.uni-saarland.de/~steiner/
  */