[mary-users] MaryTTS Viseme data

Ingmar Steiner ingmar.steiner at dfki.de
Mon Apr 24 10:33:33 CEST 2017


Just for the record, the REALISED_DURATIONS follows the ESPS Xwaves lab 
format. This means that each row represents one phonetic segment; the 
first field is the *end* time in seconds, the second field has *no 
significance*, and the third provides the segment label.

Best wishes,

-Ingmar

On 21.04.17 21:57, Joan Pere Sanchez wrote:
> Hi Idoor,
>
> These values corresponds the allophone features used to represent the
> corresponding phone, using the SAMPA notation. The first value is the
> timestamps where the sound for this phoneme starts. The second value,
> here 125, I'm not totally secure but I believe it is the f0 or central
> frequency of the sound.
> In fact, if you observe in detail the sequence of the first values is:
> 0.075 - 0.24 - 0.275 - 0.345 - 0.435 . 0.580 - 0.755 - _ (this "_" is a
> final silence)
> That is, the point where each phone starts.  In practice, it is the
> same, you can do a lip-sync totally accurate with that.
>
> Good luck!
>
>
> 2017-04-21 19:22 GMT+02:00 idoor <idoorlab88 at gmail.com
> <mailto:idoorlab88 at gmail.com>>:
>
>     Hi Joan,
>
>     Thanks for your advise, I got the result back for the text: How are
>     you. as attached,  but there are some values I am not sure what they
>     mean, like for "h", there are two values: 0.075 and 125, does the
>     value 0.075 mean how long it takes to speak "h" in seconds? and also
>     125 is hardcoded value in the source code, what does it mean for "h"?
>
>     Thanks for your help!
>
>     text: #
>
>     0.075 125 h
>
>     0.24000001 125 aU
>
>     0.275 125 A
>
>     0.345 125 r
>
>     0.435 125 j
>
>     0.58000004 125 u
>
>     0.75500005 125 _
>
>
>
>
>     On Tue, Apr 18, 2017 at 6:13 AM, Joan Pere Sanchez
>     <kaiserjp at gmail.com <mailto:kaiserjp at gmail.com>> wrote:
>
>         Hello again,
>
>         If you want to obtain phonemes and duration for lip-sync, you
>         must to call:
>
>                     mary.setOutputType("REALISED_DURATIONS");
>
>         Where you would see each phoneme and their duration. You can
>         also use another output option to see the features of the
>         tokens, this is:
>                     mary.setOutputType("TARGETFEATURES");
>
>         In both command lines 'mary.' is the instance of
>         'LocalMaryinterface' classe to manage your input.
>
>         Best,
>
>
>
>         2017-04-16 2:41 GMT+02:00 idoor <idoorlab88 at gmail.com
>         <mailto:idoorlab88 at gmail.com>>:
>
>             Joan,
>
>             Thanks for your response again!
>             I looked at this marytts-txt2wav before, I tested and got
>             the double array:
>             double[] samples =MaryAudioUtils.getSamplesAsDoubleArray(audio);
>
>             but after I got that far, I do not know what to do next to
>             get phonemes, is this double [] related to phonemes?
>             Best regards,
>
>
>             On Sat, Apr 15, 2017 at 7:01 PM, Joan Pere Sanchez
>             <kaiserjp at gmail.com <mailto:kaiserjp at gmail.com>> wrote:
>
>                 Hi Dave,
>
>                 You can take a look at this example to see how to
>                 extract from MaryTTS the time-duration for each phoneme
>                 at the same time you have the phonemes in SAMPA notation
>                 transcribed:
>
>                 https://github.com/marytts/marytts-txt2wav
>                 <https://github.com/marytts/marytts-txt2wav>
>
>                 In MaryTTS you have several option as input (text, BML,
>                 SSML, and many other) and also they are several output
>                 options. You can run the demo compilation with the
>                 server-client solution and through the interface see the
>                 options (there are a lot)
>
>                 Best,
>
>
>                 2017-04-15 22:45 GMT+02:00 idoor <idoorlab88 at gmail.com
>                 <mailto:idoorlab88 at gmail.com>>:
>
>                     Hi Joan,
>
>                     Thanks for your response, do you have any pointers
>                     of references I can read and study? does MaryTTS
>                     provide any audio data for analysis of phonemes and
>                     visemes? MaryTTS can generate .wav file, is that
>                     possible to find a library tool to analyze the wave
>                     file and get phonemes info? I found this javadoc
>                     http://elckerlyc.sourceforge.net/javadoc/Hmi/hmi/tts/mary/MaryTTSGenerator.html
>                     <http://elckerlyc.sourceforge.net/javadoc/Hmi/hmi/tts/mary/MaryTTSGenerator.html>
>                     but I could not find the souce code for this, have
>                     you happened to see the library jar file or source
>                     code for this?
>
>                     Thanks again for sharing some thoughts with me.
>
>
>
>
>                     On Sat, Apr 15, 2017 at 2:05 PM, Joan Pere Sanchez
>                     <kaiserjp at gmail.com <mailto:kaiserjp at gmail.com>> wrote:
>
>                         Hi Dave,
>
>                         This task is the main goal of my PhD thesis. I'm
>                         doing lip-sync from the input text over the time
>                         duration estimation done while the speech is
>                         generated. You can develop your own strategy for
>                         lip/mouth synchronization, but often this is an
>                         avatar (or interface -I'm using a talking head
>                         too-) dependent task. So, if you are using an
>                         avatar, it depends if you can use blend shapes
>                         to mix by interpolation from the initial pose to
>                         the next one. Most of MPEG-4 systems are able to
>                         do that automatically.
>                         On one hand, you have each phoneme and their
>                         start and finish time. On the other hand, you
>                         can adjust a set of visemes for each basic
>                         expression (no more than 15 are needed) and then
>                         choose the sequence corresponding to each word
>                         you are generating. It's the more efficient and
>                         simple way to have an effective lip synchronization.
>                         Don't hesitate to contact me if you want more
>                         info or refs about.
>
>                         Bes regards,
>
>
>                         2017-04-15 18:27 GMT+02:00 idoor Du
>                         <idoorlab88 at gmail.com
>                         <mailto:idoorlab88 at gmail.com>>:
>
>                             Hi all,
>
>                             I am new to MaryTTS, tried to call its API via:
>
>                             AudioInputStream audio =
>                             mary.generateAudio("testing");
>
>                             Now I want to animate mouth/lip shapes at
>                             runtime based on the audio sound, how to
>                             achieve that? are there any viseme data
>                             associated with the audio?
>
>                             Thanks in advance.
>
>                             Dave
>
>                             _______________________________________________
>                             Mary-users mailing list
>                             Mary-users at dfki.de <mailto:Mary-users at dfki.de>
>                             http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>                             <http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users>
>
>
>
>
>                         --
>                         *Joan Pere Sànchez Pellicer*
>                         kaiserjp at gmail.com <mailto:kaiserjp at gmail.com>
>                         www.chamaleon.net <http://www.chamaleon.net>
>                         +34 625 012 741 <tel:+34%20625%2001%2027%2041>
>
>
>
>
>
>                 --
>                 *Joan Pere Sànchez Pellicer*
>                 kaiserjp at gmail.com <mailto:kaiserjp at gmail.com>
>                 www.chamaleon.net <http://www.chamaleon.net>
>                 +34 625 012 741 <tel:+34%20625%2001%2027%2041>
>
>
>
>
>
>         --
>         *Joan Pere Sànchez Pellicer*
>         kaiserjp at gmail.com <mailto:kaiserjp at gmail.com>
>         www.chamaleon.net <http://www.chamaleon.net>
>         +34 625 012 741 <tel:+34%20625%2001%2027%2041>
>
>
>
>
>
> --
> *Joan Pere Sànchez Pellicer*
> kaiserjp at gmail.com <mailto:kaiserjp at gmail.com>
> www.chamaleon.net <http://www.chamaleon.net>
> +34 625 012 741
>
>
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>


More information about the Mary-users mailing list