<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html; charset=ISO-8859-1"

 http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<span id="result_box" class="short_text" lang="en"><span

 style="background-color: rgb(230, 236, 249); color: rgb(0, 0, 0);"

 title="">Hi all,<br>

<br>

I have found the reason</span></span> of why these words are labelled

as "lexicon" transcribed words:<br>

<br>

in&nbsp; java/marytts/language/de/JPhonemiser.java ... <br>

<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // The g2pMethod of the combined beast is<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // the g2pMethod of the first constituant.<br>

<br>

my words are combined words of two tokens separated by a hyphen and the

first token before the hyphen are effectively transcribed using the

lexicon.<br>

So the rule assign "lexicon" g2pMethod to these words <span

 id="result_box" class="" lang="en"><span style="" title="">even if

other tokens of the words have been transcribed with rules</span></span>.

I guess this was designed for efficiency and simplicity reasons.<br>

<br>

<span id="result_box" class="" lang="en"><span style="" title="">In my

case I need to </span></span><span id="result_box" class="short_text"

 lang="en"><span style="" title="">discard</span></span><span

 id="result_box" class="" lang="en"><span style="" title=""> these

words only in the selection of db, </span><span

 style="background-color: rgb(230, 236, 249); color: rgb(0, 0, 0);"

 title="">so I decided to change </span></span>marytts.tools.dbselection.FeatureMaker.checkReliability()

to fit my purpose.<br>

<br>

Best,<br>

Fabio.<br>

<br>

<br>

<br>

On 11/23/2010 08:09 PM, Fabio Tesser wrote:

<blockquote cite="mid:4CEC1164.8010404@gmail.com" type="cite">Hello, <br>

  <br>

I am running the FeatureMaker program (point 5 of

  <a class="moz-txt-link-freetext"

 href="http://mary.opendfki.de/wiki/NewLanguageSupport">http://mary.opendfki.de/wiki/NewLanguageSupport</a>)

for Italian. <br>

I have used the strict reliability option because I would like to

select only words inside my lexicon (otherwise I obtain a lot of

non-Italian words and acronyms). <br>

But even with this option I get in my selection some words not located

in the lexicon. <br>

Some examples: <br>

al-&#703;Az&#299;z <br>

MA-31PG <br>

Mini-DSLAM <br>

Z-Man <br>

  <br>

You can notice that all these words contains the '-' character. <br>

  <br>

The reliability option description says that: <br>

"With setting strict, only those sentences that contain words in the

lexicon or words that were transcribed by the preprocessor can be

selected for the synthesis script;" <br>

  <br>

So I suppuse these word are trancribed by "the preprocessor". <br>

But if I try to transcribe these words using the maryserver the result

is that they are transcribeed by the lexicon (g2p_method), but they are

not inside the lexicon. <br>

  <br>

The marytts.tools.dbselection.FeatureMaker.checkReliability() method

confirms that. <br>

  <br>

I have some questions about these words: <br>

- What is the component that transcribe these words (preprocessor)? And

how does it work? <br>

- Is it possible to assign them another g2p_method label? In this way

should be possible to have a "very strict reliability" option in

checkReliability()... <br>

- If this is not possible, does anyone have others suggestions of how

to assign, in the context of FeatureMaker, the sentences that contains

these words into the unreliable set? <br>

  <br>

Thank you, <br>

Fabio. <br>

  <br>

  <br>

  <br>

  <br>

  <br>

</blockquote>

</body>

</html>