[mary-dev] Basic NLP module for Italian
Fabio Tesser
fabio.tesser at gmail.com
Thu Nov 4 12:35:59 CET 2010
Hello,
I would like ask you for some info and suggestions.
I have a lexicon with transcriptions in the OpenMary format and with
TranscriptionTool I am able to generate the LTS rules and the lexicon in
FST format.
I have also a lexicon with POS information that could be used as input
for marytts.tools.newlanguage.LexiconCreator for the full conversion.
I think the next step is to obtain a MinimalisticPosTagger for Italian.
Is there something else missing in order to have a first basic NLP module?
I know TranscriptionTool use the funcional word flag to built a pos
tagger. I have a list of funcional word for Italian and, as first step,
I could use these in TranscriptionTool. There is any way to pass this
information in the input transcription file (i.e. abaco a1-ba-ko
functional)?
This is a first plan, but knowing that I have also the POS information
ready for the full conversion do you suggest to already built a pos
tagger able to give other information?
Another more dev-specific questions. The Italian lexicon size is 400000
entries. I have successfully obtained the rules LTS with 100000 entries
using TranscriptionTool, but I get an Out of memory error with 400000. I
will try to run the same increasing the memory with the java -Xmxn flag.
Anyway the question is about svn commit rules: Do I commit also these
large files (it.txt, it_lexicon.dict, it_lexicon.fst), which perhaps
will be replaced by new ones?
Best Regards,
Fabio.
More information about the Mary-dev
mailing list