[mary-users] MaryTTS Viseme data

Sun Apr 16 02:41:56 CEST 2017

Joan,

Thanks for your response again!
I looked at this marytts-txt2wav before, I tested and got the double array:
double[] samples = MaryAudioUtils.getSamplesAsDoubleArray(audio);

but after I got that far, I do not know what to do next to get phonemes, is
this double [] related to phonemes?
Best regards,

On Sat, Apr 15, 2017 at 7:01 PM, Joan Pere Sanchez <kaiserjp at gmail.com>
wrote:

> Hi Dave,
>
> You can take a look at this example to see how to extract from MaryTTS the
> time-duration for each phoneme at the same time you have the phonemes in
> SAMPA notation transcribed:
>
> https://github.com/marytts/marytts-txt2wav
>
> In MaryTTS you have several option as input (text, BML, SSML, and many
> other) and also they are several output options. You can run the demo
> compilation with the server-client solution and through the interface see
> the options (there are a lot)
>
> Best,
>
>
> 2017-04-15 22:45 GMT+02:00 idoor <idoorlab88 at gmail.com>:
>
>> Hi Joan,
>>
>> Thanks for your response, do you have any pointers of references I can
>> read and study? does MaryTTS provide any audio data for analysis of
>> phonemes and visemes? MaryTTS can generate .wav file, is that possible to
>> find a library tool to analyze the wave file and get phonemes info? I found
>> this javadoc
>> http://elckerlyc.sourceforge.net/javadoc/Hmi/hmi/tts/mary/Ma
>> ryTTSGenerator.html
>> but I could not find the souce code for this, have you happened to see
>> the library jar file or source code for this?
>>
>> Thanks again for sharing some thoughts with me.
>>
>>
>>
>>
>> On Sat, Apr 15, 2017 at 2:05 PM, Joan Pere Sanchez <kaiserjp at gmail.com>
>> wrote:
>>
>>> Hi Dave,
>>>
>>> This task is the main goal of my PhD thesis. I'm doing lip-sync from the
>>> input text over the time duration estimation done while the speech is
>>> generated. You can develop your own strategy for lip/mouth synchronization,
>>> but often this is an avatar (or interface -I'm using a talking head too-)
>>> dependent task. So, if you are using an avatar, it depends if you can use
>>> blend shapes to mix by interpolation from the initial pose to the next one.
>>> Most of MPEG-4 systems are able to do that automatically.
>>> On one hand, you have each phoneme and their start and finish time. On
>>> the other hand, you can adjust a set of visemes for each basic expression
>>> (no more than 15 are needed) and then choose the sequence corresponding to
>>> each word you are generating. It's the more efficient and simple way to
>>> have an effective lip synchronization.
>>> Don't hesitate to contact me if you want more info or refs about.
>>>
>>> Bes regards,
>>>
>>>
>>> 2017-04-15 18:27 GMT+02:00 idoor Du <idoorlab88 at gmail.com>:
>>>
>>>> Hi all,
>>>>
>>>> I am new to MaryTTS, tried to call its API via:
>>>>
>>>> AudioInputStream audio = mary.generateAudio("testing");
>>>>
>>>> Now I want to animate mouth/lip shapes at runtime based on the audio
>>>> sound, how to achieve that? are there any viseme data associated with
>>>> the audio?
>>>>
>>>> Thanks in advance.
>>>>
>>>> Dave
>>>>
>>>> _______________________________________________
>>>> Mary-users mailing list
>>>> Mary-users at dfki.de
>>>> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>>>>
>>>>
>>>
>>>
>>> --
>>> *Joan Pere Sànchez Pellicer*
>>> kaiserjp at gmail.com
>>> www.chamaleon.net
>>> +34 625 012 741 <+34%20625%2001%2027%2041>
>>>
>>
>>
>
>
> --
> *Joan Pere Sànchez Pellicer*
> kaiserjp at gmail.com
> www.chamaleon.net
> +34 625 012 741 <+34%20625%2001%2027%2041>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-users/attachments/20170415/1d198cff/attachment.htm