<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<span id="result_box" class="short_text" lang="en"><span
style="background-color: rgb(230, 236, 249); color: rgb(0, 0, 0);"
title="">Hi all,<br>
<br>
I have found the reason</span></span> of why these words are labelled
as "lexicon" transcribed words:<br>
<br>
in java/marytts/language/de/JPhonemiser.java ... <br>
<br>
// The g2pMethod of the combined beast is<br>
// the g2pMethod of the first constituant.<br>
<br>
my words are combined words of two tokens separated by a hyphen and the
first token before the hyphen are effectively transcribed using the
lexicon.<br>
So the rule assign "lexicon" g2pMethod to these words <span
id="result_box" class="" lang="en"><span style="" title="">even if
other tokens of the words have been transcribed with rules</span></span>.
I guess this was designed for efficiency and simplicity reasons.<br>
<br>
<span id="result_box" class="" lang="en"><span style="" title="">In my
case I need to </span></span><span id="result_box" class="short_text"
lang="en"><span style="" title="">discard</span></span><span
id="result_box" class="" lang="en"><span style="" title=""> these
words only in the selection of db, </span><span
style="background-color: rgb(230, 236, 249); color: rgb(0, 0, 0);"
title="">so I decided to change </span></span>marytts.tools.dbselection.FeatureMaker.checkReliability()
to fit my purpose.<br>
<br>
Best,<br>
Fabio.<br>
<br>
<br>
<br>
On 11/23/2010 08:09 PM, Fabio Tesser wrote:
<blockquote cite="mid:4CEC1164.8010404@gmail.com" type="cite">Hello, <br>
<br>
I am running the FeatureMaker program (point 5 of
<a class="moz-txt-link-freetext"
href="http://mary.opendfki.de/wiki/NewLanguageSupport">http://mary.opendfki.de/wiki/NewLanguageSupport</a>)
for Italian. <br>
I have used the strict reliability option because I would like to
select only words inside my lexicon (otherwise I obtain a lot of
non-Italian words and acronyms). <br>
But even with this option I get in my selection some words not located
in the lexicon. <br>
Some examples: <br>
al-ʿAzīz <br>
MA-31PG <br>
Mini-DSLAM <br>
Z-Man <br>
<br>
You can notice that all these words contains the '-' character. <br>
<br>
The reliability option description says that: <br>
"With setting strict, only those sentences that contain words in the
lexicon or words that were transcribed by the preprocessor can be
selected for the synthesis script;" <br>
<br>
So I suppuse these word are trancribed by "the preprocessor". <br>
But if I try to transcribe these words using the maryserver the result
is that they are transcribeed by the lexicon (g2p_method), but they are
not inside the lexicon. <br>
<br>
The marytts.tools.dbselection.FeatureMaker.checkReliability() method
confirms that. <br>
<br>
I have some questions about these words: <br>
- What is the component that transcribe these words (preprocessor)? And
how does it work? <br>
- Is it possible to assign them another g2p_method label? In this way
should be possible to have a "very strict reliability" option in
checkReliability()... <br>
- If this is not possible, does anyone have others suggestions of how
to assign, in the context of FeatureMaker, the sentences that contains
these words into the unreliable set? <br>
<br>
Thank you, <br>
Fabio. <br>
<br>
<br>
<br>
<br>
<br>
</blockquote>
</body>
</html>