[mary-users] Phonemes Times issues with MaryTTS.

Fri Apr 26 00:30:19 CEST 2013

Dear Ricardo,

glad to hear things are working now.

As I mentioned, REALISED_DURATIONS produces XWaves .lab file format. The 
second column has historical significance, but is meaningless for 
current applications. The format is

HEADER
#
END JUNK LABEL
...

Where HEADER may be empty and is terminated by a #.
Each of the following lines represents one phonetic segment and has 
three fields: END time in seconds, a JUNK number (with historical 
significance), and the LABEL.

Best wishes,

-Ingmar

On 4/26/13 12:12 AM, Ricardo Duarte wrote:
> Dear Ingmar,
>
> Thanks again for such quick reply, I made a mistake on the original
> mistake as you pointed out, i use Realised_AccoustParams not AccoustParams.
>
> Thanks for letting me know that i made the right choice when it comes to
> parse the Realised_AccoustParams.
> What is the second column of the Realised_Durations, isn't supposed to
> be the duration of phoneme, if it is why is always at 125ms?
>
> I just realised that there is a boundary term in the end which gives the
> exact time of the pause.
>
> Kind Regards
>
> Ricardo
>
>
> On 25 April 2013 20:17, Ingmar Steiner <ingmar.steiner at dfki.de
> <mailto:ingmar.steiner at dfki.de>> wrote:
>
>     Dear Ricardo,
>
>     thanks for such a detailed report! Here's some feedback.
>
>     First of all, what you've listed in this message is not
>     ACOUSTPARAMS, but REALISED_ACOUSTPARAMS. The former is the processed
>     input text, with all of the predicted pronunciation and prosodic
>     parameters (duration, f0). Then comes the unit selection process,
>     which finds the optimal sequence of units, given that prediction,
>     from the selected voice database. The result is REALISED_ACOUSTPARAMS.
>
>     In a perfect world, the selected units would match 100% with the
>     prediction, but reality of course bites. So the durations of the the
>     phones and syllables are adjusted according to the selected units,
>     and that's why there are differences. ACOUSTPARAMS will never match
>     up with the acoustic output of the unit selection, and the mismatch
>     will accumulate with longer input. Use REALISED_ACOUSTPARAMS
>     instead, and you will get the actual durations. Incidentally,
>     REALISED_DURATIONS is just a shortcut to get those durations in
>     XWaves .lab format.
>
>     Intonational boundaries such as those predicted from some
>     punctuation tokens currently have a fixed duration of 400 ms.
>
>     What confuses me is that you explain the issue you observe in such a
>     way that I believe you're requesting ACOUSTPARAMS instead of
>     REALISED_ACOUSTPARAMS, but the XML you paste indicates that you are
>     requesting with the latter.
>
>     I still hope that perhaps this helps answer your questions.
>
>     Best wishes,
>
>     -Ingmar
>
>
>     On 4/25/13 4:45 PM, Ricardo Duarte wrote:
>
>         Hi all,
>
>         I use MaryTTS with a coarticulation solution, which works fine
>         unless i
>         have paragraphs in the input sentences.
>
>         I am using this input text from another application to maryTTS:
>         Achilles picked up his sword and shield, and wore his armour.
>         Afterwards, he left his tent, heading for the battlefield. There, he
>         attacked Hector and killed him. Achilles picked up Hector’s body,
>         returned to the Greek camp and dropped Hector’s body. He picked up
>         Patroklus’ body and walked to the beach. There, Achilles
>         prepared for
>         Patroclus’ funeral, and then performed it. He returned to his tent,
>         where Priam was waiting and, after a discussion, Achilles
>         allowed Priam
>         to take Hector’s body.
>
>         The input works fine if i remove the " . " (period sign) from the
>         sentences but if i include the period signs, a few issues occur.
>
>         - My application currently sends two requests to handle such
>         input, an
>         accoustparams and an audio request. In the accoustparams which i
>         included below as a sample:
>
>                                   <t accent="!H*" g2p_method="lexicon"
>         ph="' A -
>         r m r=" pos="NN">
>                                       armour
>                                       <syllable accent="!H*" ph="A"
>         stress="1">
>                                           <ph d="79" end="3.1326256"
>         f0="(0,98)(50,102)(100,84)" p="A"
>                                               units="A_L w0241 38263
>         0.033; A_R
>         w0241 38264 0.0464375" />
>                                       </syllable>
>                                       <syllable ph="r m r=">
>                                           <ph d="23" end="3.1557505"
>         f0="(0,84)"
>         p="r"
>                                               units="r_L w0241 38265
>         0.01225; r_R
>         w0354 51308 0.010875" />
>                                           <ph d="23" end="3.178313" p="m"
>                                               units="m_L w0354 51309
>         0.0111875;
>         m_R w0354 51310 0.011375" />
>                                           <ph d="92" end="3.2703755"
>         f0="(50,103)(100,103)" p="r="
>                                               units="r=_L w0354 51311
>         0.0120625;
>         r=_R w0471 62522 0.08" />
>                                       </syllable>
>                                   <t pos=".">
>                                   </t>
>                                       .
>                                   </t>
>                                   <boundary breakindex="5"
>         duration="200" tone="L-L%"
>                                       units="__L w0471 62523 0.2" />
>                               </phrase>
>                           </prosody>
>                       </s>
>                       <s>
>                           <prosody pitch="+3%" range="+15%">
>                               <phrase>
>                                   <t g2p_method="lexicon" ph="' { f - t
>         r= - w r=
>         d z" pos="RB">
>                                       Afterwards
>                                       <syllable ph="{ f" stress="1">
>                                           <ph d="111" end="0.11131249"
>         f0="(0,96)(50,119)(100,112)" p="{"
>                                               units="{_L w0569 71473
>         0.0515625;
>         {_R w0569 71474 0.05975" />
>                                           <ph d="50" end="0.16131249" p="f"
>                                               units="f_L w0569 71475
>         0.02; f_R
>         w0569 71476 0.03" />
>                                       </syllable>
>
>         Once a new sentence is created maryTTS does not give a pause time
>         between the last phoneme of the sentence " r= " the final " <t>
>         . </t> "
>         so when I attempt to add the end time of the phoneme " r= " to the
>         phoneme " {" it goes outof sync slightly and this sync issue
>         grows with
>         amount paragraphs in my input text.
>
>
>         - I have had a look at realised_durations to get the end result but
>         strangely realised durations gives different phomenes durations
>         and end
>         times for the same input text.
>
>
>         My questions are;
>         - Is there a fixed duration for tokens with " . " or ", " in the
>         xml for
>         realised acoust_params if not how can i know how long will the
>         pause be
>         between each new <s> tag (sentence)?
>
>         - Why is the output from realised_durations different from the
>         xml, is
>         this expected? wont it cause to be slightly off-sync when i used
>         the audio?
>
>         - which format provides the best sync times with audio,
>         realised_acoustparams or realised_durations?
>
>         The full output are in this links:
>
>         xml:
>         http://ricardoduarte.co.uk/__marytts/output/output_with___periods.xml
>         <http://ricardoduarte.co.uk/marytts/output/output_with_periods.xml>
>
>         realised_durations:
>         http://ricardoduarte.co.uk/__marytts/output/output___realised_durations.txt
>         <http://ricardoduarte.co.uk/marytts/output/output_realised_durations.txt>
>
>         Sorry for the long post but this is really bugging me.
>
>         Regards
>
>         Ricardo
>
>
>         _________________________________________________
>         Mary-users mailing list
>         Mary-users at dfki.de <mailto:Mary-users at dfki.de>
>         http://www.dfki.de/mailman/__cgi-bin/listinfo/mary-users
>         <http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users>
>
>
>     --
>     /**
>       * Dr. Ingmar Steiner
>       *
>       * Head of Independent Research Group
>       * Multimodal Speech Processing
>       * Cluster of Excellence MMCI
>       *
>       * Senior Researcher
>       * Language Technology Lab
>       * German Research Center for
>       * Artificial Intelligence (DFKI GmbH)
>       *
>       * Adjunct Assistant Professor
>       * Department of Computer Science
>       * Saarland University
>       *
>       * Campus C7.4, Room 3.01
>       * D-66123 Saarbrücken
>       * @tel: +49-681-302-70028 <tel:%2B49-681-302-70028>
>       * @fax: +49-681-302-4317 <tel:%2B49-681-302-4317>
>       * @web: http://coli.uni-saarland.de/~__steiner/
>     <http://coli.uni-saarland.de/~steiner/>
>       */
>
>
>
>
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>