[mary-users] Difference hsmm and unit selection voices

Ingmar Steiner ingmar.steiner at dfki.de
Wed Dec 28 08:07:59 CET 2016


Dear Rijk,

hidden semi-Markov models can be used for statistical parametric speech 
synthesis to predict the parameters of a vocoder to generate a speech 
waveform "from scratch" based on text input. On the other hand, 
unit-selection synthesis retrieves and concatenates small snippets of 
speech recordings from a database that best match the text input. If 
you're interested in the scientific background, Paul Taylor's textbook 
is a good entry point. [^1]

 From a practical perspective, you need to know that HMM-based synthesis 
offers high flexibility at the expense of perceived naturalness, with a 
low memory footprint, while unit-selection offers high naturalness under 
limited flexibility, depending on the database and application domain, 
and has a memory footprint correlating with the database size.

In MaryTTS, resources can be loaded from the classpath (conventional 
Java software design) or from the filesystem, based on properties such 
as these:

> voice.cmu-slt.cartFile = jar:/marytts/voice/CmuSlt/cart.mry
> voice.cmu-slt.audioTimelineFile = MARY_BASE/lib/voices/cmu-slt/timeline_waveforms.mry

The `jar:` prefix triggers classpath loading, while `MARY_BASE` is 
expanded from the filesystem path provided by the corresponding 
property, but it could also be any other valid path.

In the unit-selection case, the audio data (particularly the 
`timeline_waveforms.mry` file) is almost always too big to be 
efficiently loaded into memory, so our solution is to locate it on the 
filesystem and read required units directly from disk at runtime.

I hope this helps. If you have further questions, please open issues on 
GitHub as appropriate.

Best wishes,

-Ingmar

P.S. TypeTalk looks very cool. We might be in touch about that. =)

[^1]: https://scholar.google.com/scholar?q=paul+taylor+text+to+speech

On 27.12.16 17:56, Rijk Theodoor Oosterhoff wrote:
> Hello,
>
> I was wondering what is the difference between a hidden semi markov
> model voice and a unit selection voice?
>
> And how do I get these on the classpath. The hssm voices are simply
> added using a maven or gradle dependency. But a unit selection voice
> needs some other files as well. How do I get these on the classpath?
>
> Btw, we are building a Marytts frontend. You can find our effort at:
> http://typetalk.github.io/TypeTalk/
>
>
> Kind regards,
> Rijk
> _______________________________________________
> Mary-users mailing list
> Mary-users at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-users
>


More information about the Mary-users mailing list