[mary-users] Timestamps mapping for generated audio

avi tshuva avi at watchtext.com
Thu Mar 1 15:12:54 CET 2012


Many thanks, Ingmar, for your quick reply. We're trying to incorporate 
your advice into our code.
What do you think: adding API methods, that give difference information 
types such as the one i mentioned, without the need to go through XML 
output - do you think it exist/should-exist or not? After all, 
internally, i guess all that information exist, it's just a matter of 
exposing it via a simple and documented API...?

On Thu 01 Mar 2012 03:59:31 PM IST, Ingmar Steiner wrote:
> Dear Avi,
>
> the easiest way is probably OUTPUT_TYPE=REALISED_DURATIONS, which 
> gives you phone endtimes in XWaves lab format.
>
> If you need word/token start times, you'll have to parse the MaryXML 
> from OUTPUT_TYPE=REALISED_ACOUSTPARAMS. In that case, you'll be 
> looking for the "end" attribute of the first <ph>one of each <t>oken's 
> first <syllable>, minus the phone's "d"uration / 1000, which works for 
> the HSMM voices. (For the unit selection voices, the predicted phone 
> duration will not match the realized end times exactly, so you'll have 
> to use your own start time variable to avoid syncing issues.)
>
> Hope this helps!
>
> Best wishes,
>
> -Ingmar
>
> On 01.03.2012 14:25, avi tshuva wrote:
>>
>> Hi,
>> how can i get the timing information for generated audio? i need it in
>> order to sync video i'm generating from the text with the audio.
>>
>> _for example:_
>> *text:* "Hello world. How are you?"
>> *timing map: *
>> {Hello: 0,
>> world: 0.1,
>> How: 0.24
>> ..
>> ...
>> }
>>
>> Thank you
>>
>>
>> -- 
>> /Avi Tshuva
>> VP R&D
>> WatchText /
>>
>>
>> _______________________________________________
>> Mary-users mailing list
>> Mary-users at dfki.de
>> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>



-- 
/Avi Tshuva
VP R&D
WatchText /



More information about the Mary-users mailing list