[mary-dev] Lexicon questions

Thu Oct 28 18:34:10 CEST 2010

Hi,

I have some questions about the lexicon building process.

Looking to the German lexicon file 'de.txt', I can imagine that '-' is 
the symbol used for syllable separation. Am I right?
I have a lexicon for Italian that include part of speech information. 
Does the openmary lexicon format support that?

The Italian lexicon is 440000 words large. I see that the 'de.txt' 
German file has 36000 words, and that not all the words have the 
transcription.
I suppose this file is used in the Transcription Tool to create the LTS 
rules, and the full transcribed lexicon is stored in a Finite State 
Transducer format.

Any suggestion for the Italian case?
I imagine that I may select some (how many?) words from the lexicon in 
order to built the LTS rules 
(http://mary.opendfki.de/wiki/TranscriptionTool) and then use all the 
file to build Finite State Transducer lexicon (does exist documentation 
for this?).

Thanks in advance.

Best,
Fabio.