[mary-dev] Very strict reliability for FeatureMaker
Fabio Tesser
fabio.tesser at gmail.com
Thu Nov 25 09:30:49 CET 2010
Hi all,
I have found the reason of why these words are labelled as "lexicon"
transcribed words:
in java/marytts/language/de/JPhonemiser.java ...
// The g2pMethod of the combined beast is
// the g2pMethod of the first constituant.
my words are combined words of two tokens separated by a hyphen and the
first token before the hyphen are effectively transcribed using the lexicon.
So the rule assign "lexicon" g2pMethod to these words even if other
tokens of the words have been transcribed with rules. I guess this was
designed for efficiency and simplicity reasons.
In my case I need to discard these words only in the selection of db, so
I decided to change
marytts.tools.dbselection.FeatureMaker.checkReliability() to fit my purpose.
Best,
Fabio.
On 11/23/2010 08:09 PM, Fabio Tesser wrote:
> Hello,
>
> I am running the FeatureMaker program (point 5 of
> http://mary.opendfki.de/wiki/NewLanguageSupport) for Italian.
> I have used the strict reliability option because I would like to
> select only words inside my lexicon (otherwise I obtain a lot of
> non-Italian words and acronyms).
> But even with this option I get in my selection some words not located
> in the lexicon.
> Some examples:
> al-?Azi-z
> MA-31PG
> Mini-DSLAM
> Z-Man
>
> You can notice that all these words contains the '-' character.
>
> The reliability option description says that:
> "With setting strict, only those sentences that contain words in the
> lexicon or words that were transcribed by the preprocessor can be
> selected for the synthesis script;"
>
> So I suppuse these word are trancribed by "the preprocessor".
> But if I try to transcribe these words using the maryserver the result
> is that they are transcribeed by the lexicon (g2p_method), but they
> are not inside the lexicon.
>
> The marytts.tools.dbselection.FeatureMaker.checkReliability() method
> confirms that.
>
> I have some questions about these words:
> - What is the component that transcribe these words (preprocessor)?
> And how does it work?
> - Is it possible to assign them another g2p_method label? In this way
> should be possible to have a "very strict reliability" option in
> checkReliability()...
> - If this is not possible, does anyone have others suggestions of how
> to assign, in the context of FeatureMaker, the sentences that contains
> these words into the unreliable set?
>
> Thank you,
> Fabio.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-dev/attachments/20101125/c26b2ddb/attachment.htm
More information about the Mary-dev
mailing list