[mary-users] Phonemes Times issues with MaryTTS.

Ricardo Duarte ricardo.lpd at gmail.com
Fri Apr 26 00:12:05 CEST 2013


Dear Ingmar,

Thanks again for such quick reply, I made a mistake on the original mistake
as you pointed out, i use Realised_AccoustParams not AccoustParams.

Thanks for letting me know that i made the right choice when it comes to
parse the Realised_AccoustParams.
What is the second column of the Realised_Durations, isn't supposed to be
the duration of phoneme, if it is why is always at 125ms?

I just realised that there is a boundary term in the end which gives the
exact time of the pause.

Kind Regards

Ricardo


On 25 April 2013 20:17, Ingmar Steiner <ingmar.steiner at dfki.de> wrote:

> Dear Ricardo,
>
> thanks for such a detailed report! Here's some feedback.
>
> First of all, what you've listed in this message is not ACOUSTPARAMS, but
> REALISED_ACOUSTPARAMS. The former is the processed input text, with all of
> the predicted pronunciation and prosodic parameters (duration, f0). Then
> comes the unit selection process, which finds the optimal sequence of
> units, given that prediction, from the selected voice database. The result
> is REALISED_ACOUSTPARAMS.
>
> In a perfect world, the selected units would match 100% with the
> prediction, but reality of course bites. So the durations of the the phones
> and syllables are adjusted according to the selected units, and that's why
> there are differences. ACOUSTPARAMS will never match up with the acoustic
> output of the unit selection, and the mismatch will accumulate with longer
> input. Use REALISED_ACOUSTPARAMS instead, and you will get the actual
> durations. Incidentally, REALISED_DURATIONS is just a shortcut to get those
> durations in XWaves .lab format.
>
> Intonational boundaries such as those predicted from some punctuation
> tokens currently have a fixed duration of 400 ms.
>
> What confuses me is that you explain the issue you observe in such a way
> that I believe you're requesting ACOUSTPARAMS instead of
> REALISED_ACOUSTPARAMS, but the XML you paste indicates that you are
> requesting with the latter.
>
> I still hope that perhaps this helps answer your questions.
>
> Best wishes,
>
> -Ingmar
>
>
> On 4/25/13 4:45 PM, Ricardo Duarte wrote:
>
>> Hi all,
>>
>> I use MaryTTS with a coarticulation solution, which works fine unless i
>> have paragraphs in the input sentences.
>>
>> I am using this input text from another application to maryTTS:
>> Achilles picked up his sword and shield, and wore his armour.
>> Afterwards, he left his tent, heading for the battlefield. There, he
>> attacked Hector and killed him. Achilles picked up Hector’s body,
>> returned to the Greek camp and dropped Hector’s body. He picked up
>> Patroklus’ body and walked to the beach. There, Achilles prepared for
>> Patroclus’ funeral, and then performed it. He returned to his tent,
>> where Priam was waiting and, after a discussion, Achilles allowed Priam
>> to take Hector’s body.
>>
>> The input works fine if i remove the " . " (period sign) from the
>> sentences but if i include the period signs, a few issues occur.
>>
>> - My application currently sends two requests to handle such input, an
>> accoustparams and an audio request. In the accoustparams which i
>> included below as a sample:
>>
>>                          <t accent="!H*" g2p_method="lexicon" ph="' A -
>> r m r=" pos="NN">
>>                              armour
>>                              <syllable accent="!H*" ph="A" stress="1">
>>                                  <ph d="79" end="3.1326256"
>> f0="(0,98)(50,102)(100,84)" p="A"
>>                                      units="A_L w0241 38263 0.033; A_R
>> w0241 38264 0.0464375" />
>>                              </syllable>
>>                              <syllable ph="r m r=">
>>                                  <ph d="23" end="3.1557505" f0="(0,84)"
>> p="r"
>>                                      units="r_L w0241 38265 0.01225; r_R
>> w0354 51308 0.010875" />
>>                                  <ph d="23" end="3.178313" p="m"
>>                                      units="m_L w0354 51309 0.0111875;
>> m_R w0354 51310 0.011375" />
>>                                  <ph d="92" end="3.2703755"
>> f0="(50,103)(100,103)" p="r="
>>                                      units="r=_L w0354 51311 0.0120625;
>> r=_R w0471 62522 0.08" />
>>                              </syllable>
>>                          <t pos=".">
>>                          </t>
>>                              .
>>                          </t>
>>                          <boundary breakindex="5" duration="200"
>> tone="L-L%"
>>                              units="__L w0471 62523 0.2" />
>>                      </phrase>
>>                  </prosody>
>>              </s>
>>              <s>
>>                  <prosody pitch="+3%" range="+15%">
>>                      <phrase>
>>                          <t g2p_method="lexicon" ph="' { f - t r= - w r=
>> d z" pos="RB">
>>                              Afterwards
>>                              <syllable ph="{ f" stress="1">
>>                                  <ph d="111" end="0.11131249"
>> f0="(0,96)(50,119)(100,112)" p="{"
>>                                      units="{_L w0569 71473 0.0515625;
>> {_R w0569 71474 0.05975" />
>>                                  <ph d="50" end="0.16131249" p="f"
>>                                      units="f_L w0569 71475 0.02; f_R
>> w0569 71476 0.03" />
>>                              </syllable>
>>
>> Once a new sentence is created maryTTS does not give a pause time
>> between the last phoneme of the sentence " r= " the final " <t> . </t> "
>> so when I attempt to add the end time of the phoneme " r= " to the
>> phoneme " {" it goes outof sync slightly and this sync issue grows with
>> amount paragraphs in my input text.
>>
>>
>> - I have had a look at realised_durations to get the end result but
>> strangely realised durations gives different phomenes durations and end
>> times for the same input text.
>>
>>
>> My questions are;
>> - Is there a fixed duration for tokens with " . " or ", " in the xml for
>> realised acoust_params if not how can i know how long will the pause be
>> between each new <s> tag (sentence)?
>>
>> - Why is the output from realised_durations different from the xml, is
>> this expected? wont it cause to be slightly off-sync when i used the
>> audio?
>>
>> - which format provides the best sync times with audio,
>> realised_acoustparams or realised_durations?
>>
>> The full output are in this links:
>>
>> xml: http://ricardoduarte.co.uk/**marytts/output/output_with_**
>> periods.xml<http://ricardoduarte.co.uk/marytts/output/output_with_periods.xml>
>>
>> realised_durations:
>> http://ricardoduarte.co.uk/**marytts/output/output_**
>> realised_durations.txt<http://ricardoduarte.co.uk/marytts/output/output_realised_durations.txt>
>>
>> Sorry for the long post but this is really bugging me.
>>
>> Regards
>>
>> Ricardo
>>
>>
>> ______________________________**_________________
>> Mary-users mailing list
>> Mary-users at dfki.de
>> http://www.dfki.de/mailman/**cgi-bin/listinfo/mary-users<http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users>
>>
>>
> --
> /**
>  * Dr. Ingmar Steiner
>  *
>  * Head of Independent Research Group
>  * Multimodal Speech Processing
>  * Cluster of Excellence MMCI
>  *
>  * Senior Researcher
>  * Language Technology Lab
>  * German Research Center for
>  * Artificial Intelligence (DFKI GmbH)
>  *
>  * Adjunct Assistant Professor
>  * Department of Computer Science
>  * Saarland University
>  *
>  * Campus C7.4, Room 3.01
>  * D-66123 Saarbrücken
>  * @tel: +49-681-302-70028
>  * @fax: +49-681-302-4317
>  * @web: http://coli.uni-saarland.de/~**steiner/<http://coli.uni-saarland.de/~steiner/>
>  */
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-users/attachments/20130425/02703e4a/attachment-0001.htm 


More information about the Mary-users mailing list