[mary-users] Phonemes Times issues with MaryTTS.
Ingmar Steiner
ingmar.steiner at dfki.de
Fri Apr 26 00:30:19 CEST 2013
Dear Ricardo,
glad to hear things are working now.
As I mentioned, REALISED_DURATIONS produces XWaves .lab file format. The
second column has historical significance, but is meaningless for
current applications. The format is
HEADER
#
END JUNK LABEL
...
Where HEADER may be empty and is terminated by a #.
Each of the following lines represents one phonetic segment and has
three fields: END time in seconds, a JUNK number (with historical
significance), and the LABEL.
Best wishes,
-Ingmar
On 4/26/13 12:12 AM, Ricardo Duarte wrote:
> Dear Ingmar,
>
> Thanks again for such quick reply, I made a mistake on the original
> mistake as you pointed out, i use Realised_AccoustParams not AccoustParams.
>
> Thanks for letting me know that i made the right choice when it comes to
> parse the Realised_AccoustParams.
> What is the second column of the Realised_Durations, isn't supposed to
> be the duration of phoneme, if it is why is always at 125ms?
>
> I just realised that there is a boundary term in the end which gives the
> exact time of the pause.
>
> Kind Regards
>
> Ricardo
>
>
> On 25 April 2013 20:17, Ingmar Steiner <ingmar.steiner at dfki.de
> <mailto:ingmar.steiner at dfki.de>> wrote:
>
> Dear Ricardo,
>
> thanks for such a detailed report! Here's some feedback.
>
> First of all, what you've listed in this message is not
> ACOUSTPARAMS, but REALISED_ACOUSTPARAMS. The former is the processed
> input text, with all of the predicted pronunciation and prosodic
> parameters (duration, f0). Then comes the unit selection process,
> which finds the optimal sequence of units, given that prediction,
> from the selected voice database. The result is REALISED_ACOUSTPARAMS.
>
> In a perfect world, the selected units would match 100% with the
> prediction, but reality of course bites. So the durations of the the
> phones and syllables are adjusted according to the selected units,
> and that's why there are differences. ACOUSTPARAMS will never match
> up with the acoustic output of the unit selection, and the mismatch
> will accumulate with longer input. Use REALISED_ACOUSTPARAMS
> instead, and you will get the actual durations. Incidentally,
> REALISED_DURATIONS is just a shortcut to get those durations in
> XWaves .lab format.
>
> Intonational boundaries such as those predicted from some
> punctuation tokens currently have a fixed duration of 400 ms.
>
> What confuses me is that you explain the issue you observe in such a
> way that I believe you're requesting ACOUSTPARAMS instead of
> REALISED_ACOUSTPARAMS, but the XML you paste indicates that you are
> requesting with the latter.
>
> I still hope that perhaps this helps answer your questions.
>
> Best wishes,
>
> -Ingmar
>
>
> On 4/25/13 4:45 PM, Ricardo Duarte wrote:
>
> Hi all,
>
> I use MaryTTS with a coarticulation solution, which works fine
> unless i
> have paragraphs in the input sentences.
>
> I am using this input text from another application to maryTTS:
> Achilles picked up his sword and shield, and wore his armour.
> Afterwards, he left his tent, heading for the battlefield. There, he
> attacked Hector and killed him. Achilles picked up Hector’s body,
> returned to the Greek camp and dropped Hector’s body. He picked up
> Patroklus’ body and walked to the beach. There, Achilles
> prepared for
> Patroclus’ funeral, and then performed it. He returned to his tent,
> where Priam was waiting and, after a discussion, Achilles
> allowed Priam
> to take Hector’s body.
>
> The input works fine if i remove the " . " (period sign) from the
> sentences but if i include the period signs, a few issues occur.
>
> - My application currently sends two requests to handle such
> input, an
> accoustparams and an audio request. In the accoustparams which i
> included below as a sample:
>
> <t accent="!H*" g2p_method="lexicon"
> ph="' A -
> r m r=" pos="NN">
> armour
> <syllable accent="!H*" ph="A"
> stress="1">
> <ph d="79" end="3.1326256"
> f0="(0,98)(50,102)(100,84)" p="A"
> units="A_L w0241 38263
> 0.033; A_R
> w0241 38264 0.0464375" />
> </syllable>
> <syllable ph="r m r=">
> <ph d="23" end="3.1557505"
> f0="(0,84)"
> p="r"
> units="r_L w0241 38265
> 0.01225; r_R
> w0354 51308 0.010875" />
> <ph d="23" end="3.178313" p="m"
> units="m_L w0354 51309
> 0.0111875;
> m_R w0354 51310 0.011375" />
> <ph d="92" end="3.2703755"
> f0="(50,103)(100,103)" p="r="
> units="r=_L w0354 51311
> 0.0120625;
> r=_R w0471 62522 0.08" />
> </syllable>
> <t pos=".">
> </t>
> .
> </t>
> <boundary breakindex="5"
> duration="200" tone="L-L%"
> units="__L w0471 62523 0.2" />
> </phrase>
> </prosody>
> </s>
> <s>
> <prosody pitch="+3%" range="+15%">
> <phrase>
> <t g2p_method="lexicon" ph="' { f - t
> r= - w r=
> d z" pos="RB">
> Afterwards
> <syllable ph="{ f" stress="1">
> <ph d="111" end="0.11131249"
> f0="(0,96)(50,119)(100,112)" p="{"
> units="{_L w0569 71473
> 0.0515625;
> {_R w0569 71474 0.05975" />
> <ph d="50" end="0.16131249" p="f"
> units="f_L w0569 71475
> 0.02; f_R
> w0569 71476 0.03" />
> </syllable>
>
> Once a new sentence is created maryTTS does not give a pause time
> between the last phoneme of the sentence " r= " the final " <t>
> . </t> "
> so when I attempt to add the end time of the phoneme " r= " to the
> phoneme " {" it goes outof sync slightly and this sync issue
> grows with
> amount paragraphs in my input text.
>
>
> - I have had a look at realised_durations to get the end result but
> strangely realised durations gives different phomenes durations
> and end
> times for the same input text.
>
>
> My questions are;
> - Is there a fixed duration for tokens with " . " or ", " in the
> xml for
> realised acoust_params if not how can i know how long will the
> pause be
> between each new <s> tag (sentence)?
>
> - Why is the output from realised_durations different from the
> xml, is
> this expected? wont it cause to be slightly off-sync when i used
> the audio?
>
> - which format provides the best sync times with audio,
> realised_acoustparams or realised_durations?
>
> The full output are in this links:
>
> xml:
> http://ricardoduarte.co.uk/__marytts/output/output_with___periods.xml
> <http://ricardoduarte.co.uk/marytts/output/output_with_periods.xml>
>
> realised_durations:
> http://ricardoduarte.co.uk/__marytts/output/output___realised_durations.txt
> <http://ricardoduarte.co.uk/marytts/output/output_realised_durations.txt>
>
> Sorry for the long post but this is really bugging me.
>
> Regards
>
> Ricardo
>
>
> _________________________________________________
> Mary-users mailing list
> Mary-users at dfki.de <mailto:Mary-users at dfki.de>
> http://www.dfki.de/mailman/__cgi-bin/listinfo/mary-users
> <http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users>
>
>
> --
> /**
> * Dr. Ingmar Steiner
> *
> * Head of Independent Research Group
> * Multimodal Speech Processing
> * Cluster of Excellence MMCI
> *
> * Senior Researcher
> * Language Technology Lab
> * German Research Center for
> * Artificial Intelligence (DFKI GmbH)
> *
> * Adjunct Assistant Professor
> * Department of Computer Science
> * Saarland University
> *
> * Campus C7.4, Room 3.01
> * D-66123 Saarbrücken
> * @tel: +49-681-302-70028 <tel:%2B49-681-302-70028>
> * @fax: +49-681-302-4317 <tel:%2B49-681-302-4317>
> * @web: http://coli.uni-saarland.de/~__steiner/
> <http://coli.uni-saarland.de/~steiner/>
> */
>
>
>
>
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>
More information about the Mary-users
mailing list