[mary-dev] Very strict reliability for FeatureMaker

Fabio Tesser fabio.tesser at gmail.com
Thu Nov 25 09:30:49 CET 2010


Hi all,

I have found the reason of why these words are labelled as "lexicon" 
transcribed words:

in  java/marytts/language/de/JPhonemiser.java ...

                             // The g2pMethod of the combined beast is
                             // the g2pMethod of the first constituant.

my words are combined words of two tokens separated by a hyphen and the 
first token before the hyphen are effectively transcribed using the lexicon.
So the rule assign "lexicon" g2pMethod to these words even if other 
tokens of the words have been transcribed with rules. I guess this was 
designed for efficiency and simplicity reasons.

In my case I need to discard these words only in the selection of db, so 
I decided to change 
marytts.tools.dbselection.FeatureMaker.checkReliability() to fit my purpose.

Best,
Fabio.



On 11/23/2010 08:09 PM, Fabio Tesser wrote:
> Hello,
>
> I am running the FeatureMaker program (point 5 of 
> http://mary.opendfki.de/wiki/NewLanguageSupport) for Italian.
> I have used the strict reliability option because I would like to 
> select only words inside my lexicon (otherwise I obtain a lot of 
> non-Italian words and acronyms).
> But even with this option I get in my selection some words not located 
> in the lexicon.
> Some examples:
> al-?Azi-z
> MA-31PG
> Mini-DSLAM
> Z-Man
>
> You can notice that all these words contains the '-' character.
>
> The reliability option description says that:
> "With setting strict, only those sentences that contain words in the 
> lexicon or words that were transcribed by the preprocessor can be 
> selected for the synthesis script;"
>
> So I suppuse these word are trancribed by "the preprocessor".
> But if I try to transcribe these words using the maryserver the result 
> is that they are transcribeed by the lexicon (g2p_method), but they 
> are not inside the lexicon.
>
> The marytts.tools.dbselection.FeatureMaker.checkReliability() method 
> confirms that.
>
> I have some questions about these words:
> - What is the component that transcribe these words (preprocessor)? 
> And how does it work?
> - Is it possible to assign them another g2p_method label? In this way 
> should be possible to have a "very strict reliability" option in 
> checkReliability()...
> - If this is not possible, does anyone have others suggestions of how 
> to assign, in the context of FeatureMaker, the sentences that contains 
> these words into the unreliable set?
>
> Thank you,
> Fabio.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-dev/attachments/20101125/c26b2ddb/attachment.htm 


More information about the Mary-dev mailing list