[mary-users] Fwd: Issues on MaryTTS with SSML Prosody Tag

Tue Nov 8 11:03:17 CET 2016

Hi Gavin,

On 08.11.16 01:26, 姚晋 wrote:
> Hello Dr. Steiner,
>
> Thanks for your detailed answer.
>
> For the link to prosody documentation, I checked with different time
> (last Friday and today), as well as different people (myself and my
> friends), there is still a problem in connection failure. Is it possible
> that you can access it as you are using intranet with the server, but it
> might be different for external visitors?

I've tried from different external networks, with different browsers, 
and the only problem I noticed was a single timeout, which went away 
after refreshing. I suspect that either your browser is too impatient, 
or that it doesn't trust the certificate, but I can't reproduce the issue.

If the problem persists, please contact admin at opendfki.de.

>
> I tried to check the documentation for the mapping between MaryXML
> prosody value range to the measurement unit, for example:
>
> - What is the default value for volume in dB? What is the default value
> for speaking rate in syllable/s?, etc.

Because the volume is not processed in any way, it would be whatever was 
in the original recordings.

> - Regarding the range for manipulating volume in Mary GUI is [0.0,
> 10.0], is 1 scale for +6dB?

If you are referring to the "Audio Effects" in the browser (or java 
client) GUI, that has nothing to do with the synthesis, and is simply a 
DSP filter applied to the output audio. As you can see in the source 
code [^1], the volume there is simply a linear scaling of the sample values.

>
> As I still had the connection problem to that website, could you share a
> PDF documentation with me by email if you have?

I've merged that page into the GitHub wiki for convenience:

https://github.com/marytts/marytts/wiki/ProsodySpecificationSupport

Best wishes,

-Ingmar

[^1]: 
https://github.com/marytts/marytts/blob/d7b57e81d0e8e9abbd4f8be4ca66766ff700687c/marytts-runtime/src/main/java/marytts/signalproc/effects/VolumeEffect.java#L54-L64

>
> Best Regards,
> Gavin
>
>
>
>
> On 4 November 2016 at 14:31, Ingmar Steiner <ingmar.steiner at dfki.de
> <mailto:ingmar.steiner at dfki.de>> wrote:
>
>     Hi Gavin,
>
>     thanks for your detailed message!
>
>     On 04.11.16 12:57, 姚晋 wrote:
>
>         Hello all,
>
>         I am using the MaryTTS in an English prosody study, but meet some
>         problem as list below, please help check if it is the problem of
>         my SSML
>         or there is a problem in Mary with SSML, thanks in advance:
>
>         1. The link to user documentation doesn't work (404 Not found)
>         (http://mary.opendfki.de/trac/wiki/ProsodySpecificationSupport
>         <http://mary.opendfki.de/trac/wiki/ProsodySpecificationSupport>
>         <http://mary.opendfki.de/trac/wiki/ProsodySpecificationSupport
>         <http://mary.opendfki.de/trac/wiki/ProsodySpecificationSupport>>).
>         As
>         shown in attachment 1.
>
>
>     The URL works fine for me.
>
>
>         2. I use the MaryTTS GUI directly for synthesizing speech with
>         SSML, and
>         find "volume" tag works in Bing Speech API, but seems not working in
>         Mary. But it works if I modify "volume" with "Audio Effects" GUI
>         directly. Please refer attachment 2.1 for SSML, and 2.2 for analysis
>         with Praat.
>
>
>     The most important thing to be aware of is that SSML is a
>     recommendation, not a standard, and it's up to each "vendor" whether
>     and how the various SSML features are implemented (I had an
>     insightful discussion with Paul Bagshaw about this a few weeks ago).
>
>     In MaryTTS, SSML is parsed and transformed into MaryXML using XSLT
>     [^1]. The resulting prosody attributes are then available to the
>     MaryTTS modules for processing.
>
>     As it happens, any volume attribute is completely ignored by the
>     class that handles the prosody element [^2]. In other words,
>     fine-grained volume control is not currently possible with MaryXML.
>
>
>         3. "pitch" tag works for hmm-based voice, but not work
>         accurately for
>         unit-selection voice. Please refer attachment 3.1 for SSML, and
>         3.2 for
>         analysis with Praat.
>
>
>     Unit-selection synthesis, by its very nature, does not allow
>     fine-grained prosody control. It concatenates the selected units
>     from a voice database without modification and offers high
>     naturalness by sacrificing flexibility. There have been experiments
>     with signal manipulation after concatenation, but this tends to
>     introduce unacceptable artifacts.
>
>     If you need fine-grained control over prosody, you are better off
>     using statistical parametric synthesis or diphone synthesis (e.g.,
>     MBROLA). Note that the last version of MaryTTS that supported MBROLA
>     was v4.3.1 -- we plan to support MBROLA again in the near future,
>     but that's still work in progress. In the meantime, v4.3.1 should
>     work just fine for you under Windows.
>
>     If you have further questions, please feel free to engage the
>     developers on our issue tracker [^3] and post technical questions or
>     bug reports as appropriate.
>
>     Best wishes,
>
>     -Ingmar
>
>
>         Best Regards,
>         Gavin
>
>
>
>         _______________________________________________
>         Mary-users mailing list
>         Mary-users at dfki.de <mailto:Mary-users at dfki.de>
>         http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>         <http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users>
>
>
>     [^1]:
>     https://github.com/marytts/marytts/blob/v5.2/marytts-runtime/src/main/resources/marytts/modules/ssml-to-mary.xsl
>     <https://github.com/marytts/marytts/blob/v5.2/marytts-runtime/src/main/resources/marytts/modules/ssml-to-mary.xsl>
>
>     [^2]:
>     https://github.com/marytts/marytts/blob/v5.2/marytts-runtime/src/main/java/marytts/modules/acoustic/ProsodyElementHandler.java
>     <https://github.com/marytts/marytts/blob/v5.2/marytts-runtime/src/main/java/marytts/modules/acoustic/ProsodyElementHandler.java>
>
>     [^3]: https://github.com/marytts/marytts/issues
>     <https://github.com/marytts/marytts/issues>
>
>     --
>     /**
>      * Dr. Ingmar Steiner
>      *
>      * Head of Independent Research Group
>      * Multimodal Speech Processing
>      * Cluster of Excellence MMCI
>      *
>      * Senior Researcher
>      * Multilingual Technologies Lab
>      * German Research Center for
>      * Artificial Intelligence (DFKI GmbH)
>      *
>      * Principal Investigator
>      * Collaborative Research Center SFB-1102
>      * Information Density and Linguistic Encoding
>      *
>      * Department of Computer Science
>      * Department of Computational Linguistics & Phonetics
>      * Saarland University
>      *
>      * Campus C7.4, Room 2.01
>      * D-66123 Saarbrücken
>      * @tel: +49-681-302-70028 <tel:%2B49-681-302-70028>
>      * @fax: +49-681-302-4317 <tel:%2B49-681-302-4317>
>      * @web: http://coli.uni-saarland.de/~steiner/
>     <http://coli.uni-saarland.de/~steiner/>
>      */
>
>