[mary-dev] Lexicon questions
Fabio Tesser
fabio.tesser at gmail.com
Thu Oct 28 18:34:10 CEST 2010
Hi,
I have some questions about the lexicon building process.
Looking to the German lexicon file 'de.txt', I can imagine that '-' is
the symbol used for syllable separation. Am I right?
I have a lexicon for Italian that include part of speech information.
Does the openmary lexicon format support that?
The Italian lexicon is 440000 words large. I see that the 'de.txt'
German file has 36000 words, and that not all the words have the
transcription.
I suppose this file is used in the Transcription Tool to create the LTS
rules, and the full transcribed lexicon is stored in a Finite State
Transducer format.
Any suggestion for the Italian case?
I imagine that I may select some (how many?) words from the lexicon in
order to built the LTS rules
(http://mary.opendfki.de/wiki/TranscriptionTool) and then use all the
file to build Finite State Transducer lexicon (does exist documentation
for this?).
Thanks in advance.
Best,
Fabio.
More information about the Mary-dev
mailing list