[mary-users] Phonemes Times issues with MaryTTS.
Ricardo Duarte
ricardo.lpd at gmail.com
Fri Apr 26 00:12:05 CEST 2013
Dear Ingmar,
Thanks again for such quick reply, I made a mistake on the original mistake
as you pointed out, i use Realised_AccoustParams not AccoustParams.
Thanks for letting me know that i made the right choice when it comes to
parse the Realised_AccoustParams.
What is the second column of the Realised_Durations, isn't supposed to be
the duration of phoneme, if it is why is always at 125ms?
I just realised that there is a boundary term in the end which gives the
exact time of the pause.
Kind Regards
Ricardo
On 25 April 2013 20:17, Ingmar Steiner <ingmar.steiner at dfki.de> wrote:
> Dear Ricardo,
>
> thanks for such a detailed report! Here's some feedback.
>
> First of all, what you've listed in this message is not ACOUSTPARAMS, but
> REALISED_ACOUSTPARAMS. The former is the processed input text, with all of
> the predicted pronunciation and prosodic parameters (duration, f0). Then
> comes the unit selection process, which finds the optimal sequence of
> units, given that prediction, from the selected voice database. The result
> is REALISED_ACOUSTPARAMS.
>
> In a perfect world, the selected units would match 100% with the
> prediction, but reality of course bites. So the durations of the the phones
> and syllables are adjusted according to the selected units, and that's why
> there are differences. ACOUSTPARAMS will never match up with the acoustic
> output of the unit selection, and the mismatch will accumulate with longer
> input. Use REALISED_ACOUSTPARAMS instead, and you will get the actual
> durations. Incidentally, REALISED_DURATIONS is just a shortcut to get those
> durations in XWaves .lab format.
>
> Intonational boundaries such as those predicted from some punctuation
> tokens currently have a fixed duration of 400 ms.
>
> What confuses me is that you explain the issue you observe in such a way
> that I believe you're requesting ACOUSTPARAMS instead of
> REALISED_ACOUSTPARAMS, but the XML you paste indicates that you are
> requesting with the latter.
>
> I still hope that perhaps this helps answer your questions.
>
> Best wishes,
>
> -Ingmar
>
>
> On 4/25/13 4:45 PM, Ricardo Duarte wrote:
>
>> Hi all,
>>
>> I use MaryTTS with a coarticulation solution, which works fine unless i
>> have paragraphs in the input sentences.
>>
>> I am using this input text from another application to maryTTS:
>> Achilles picked up his sword and shield, and wore his armour.
>> Afterwards, he left his tent, heading for the battlefield. There, he
>> attacked Hector and killed him. Achilles picked up Hector’s body,
>> returned to the Greek camp and dropped Hector’s body. He picked up
>> Patroklus’ body and walked to the beach. There, Achilles prepared for
>> Patroclus’ funeral, and then performed it. He returned to his tent,
>> where Priam was waiting and, after a discussion, Achilles allowed Priam
>> to take Hector’s body.
>>
>> The input works fine if i remove the " . " (period sign) from the
>> sentences but if i include the period signs, a few issues occur.
>>
>> - My application currently sends two requests to handle such input, an
>> accoustparams and an audio request. In the accoustparams which i
>> included below as a sample:
>>
>> <t accent="!H*" g2p_method="lexicon" ph="' A -
>> r m r=" pos="NN">
>> armour
>> <syllable accent="!H*" ph="A" stress="1">
>> <ph d="79" end="3.1326256"
>> f0="(0,98)(50,102)(100,84)" p="A"
>> units="A_L w0241 38263 0.033; A_R
>> w0241 38264 0.0464375" />
>> </syllable>
>> <syllable ph="r m r=">
>> <ph d="23" end="3.1557505" f0="(0,84)"
>> p="r"
>> units="r_L w0241 38265 0.01225; r_R
>> w0354 51308 0.010875" />
>> <ph d="23" end="3.178313" p="m"
>> units="m_L w0354 51309 0.0111875;
>> m_R w0354 51310 0.011375" />
>> <ph d="92" end="3.2703755"
>> f0="(50,103)(100,103)" p="r="
>> units="r=_L w0354 51311 0.0120625;
>> r=_R w0471 62522 0.08" />
>> </syllable>
>> <t pos=".">
>> </t>
>> .
>> </t>
>> <boundary breakindex="5" duration="200"
>> tone="L-L%"
>> units="__L w0471 62523 0.2" />
>> </phrase>
>> </prosody>
>> </s>
>> <s>
>> <prosody pitch="+3%" range="+15%">
>> <phrase>
>> <t g2p_method="lexicon" ph="' { f - t r= - w r=
>> d z" pos="RB">
>> Afterwards
>> <syllable ph="{ f" stress="1">
>> <ph d="111" end="0.11131249"
>> f0="(0,96)(50,119)(100,112)" p="{"
>> units="{_L w0569 71473 0.0515625;
>> {_R w0569 71474 0.05975" />
>> <ph d="50" end="0.16131249" p="f"
>> units="f_L w0569 71475 0.02; f_R
>> w0569 71476 0.03" />
>> </syllable>
>>
>> Once a new sentence is created maryTTS does not give a pause time
>> between the last phoneme of the sentence " r= " the final " <t> . </t> "
>> so when I attempt to add the end time of the phoneme " r= " to the
>> phoneme " {" it goes outof sync slightly and this sync issue grows with
>> amount paragraphs in my input text.
>>
>>
>> - I have had a look at realised_durations to get the end result but
>> strangely realised durations gives different phomenes durations and end
>> times for the same input text.
>>
>>
>> My questions are;
>> - Is there a fixed duration for tokens with " . " or ", " in the xml for
>> realised acoust_params if not how can i know how long will the pause be
>> between each new <s> tag (sentence)?
>>
>> - Why is the output from realised_durations different from the xml, is
>> this expected? wont it cause to be slightly off-sync when i used the
>> audio?
>>
>> - which format provides the best sync times with audio,
>> realised_acoustparams or realised_durations?
>>
>> The full output are in this links:
>>
>> xml: http://ricardoduarte.co.uk/**marytts/output/output_with_**
>> periods.xml<http://ricardoduarte.co.uk/marytts/output/output_with_periods.xml>
>>
>> realised_durations:
>> http://ricardoduarte.co.uk/**marytts/output/output_**
>> realised_durations.txt<http://ricardoduarte.co.uk/marytts/output/output_realised_durations.txt>
>>
>> Sorry for the long post but this is really bugging me.
>>
>> Regards
>>
>> Ricardo
>>
>>
>> ______________________________**_________________
>> Mary-users mailing list
>> Mary-users at dfki.de
>> http://www.dfki.de/mailman/**cgi-bin/listinfo/mary-users<http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users>
>>
>>
> --
> /**
> * Dr. Ingmar Steiner
> *
> * Head of Independent Research Group
> * Multimodal Speech Processing
> * Cluster of Excellence MMCI
> *
> * Senior Researcher
> * Language Technology Lab
> * German Research Center for
> * Artificial Intelligence (DFKI GmbH)
> *
> * Adjunct Assistant Professor
> * Department of Computer Science
> * Saarland University
> *
> * Campus C7.4, Room 3.01
> * D-66123 Saarbrücken
> * @tel: +49-681-302-70028
> * @fax: +49-681-302-4317
> * @web: http://coli.uni-saarland.de/~**steiner/<http://coli.uni-saarland.de/~steiner/>
> */
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-users/attachments/20130425/02703e4a/attachment-0001.htm
More information about the Mary-users
mailing list