[mary-dev] WikipediaProcessor: Japanese Processing Exception

Paulo Levi i30817 at gmail.com
Mon Nov 9 17:31:46 CET 2009


The oom condition doesn't necessary occur because of the (unlimited i
assume) cache, but it certainly seems a prime suspect.
On a more productive vertent, if you want you could you use the
jvisualvm application (on a console) to find if the memory leak comes
from there. You need the size of the heap on disk space so remember to
delete it after use.

Open jvisualvm, the application, right click on it tree node on jvvm
and enable heap dump on OOM. Then play with it a little. The most of
the mem should be bytes[] but there should be structures that hold
them (strings probably) held by another persistent or non limited data
structure.
I don't think that jvvm can tell where exactly in the code the data
comes from, but it should be possible to restrict things.


On Mon, Nov 9, 2009 at 7:52 AM, Marc Schroeder <schroed at dfki.de> wrote:
> Thanks for this update -- good to hear that more memory solves this
> problem. Of course it seems curious that 2GB of RAM should be required
> for running this code; if anyone would like to try and reduce the
> footprint, let me know.
>
> Best,
> Marc
>
> Hind Abdul-Khaleq schrieb:
>> The problem solved with "-Xmx2000m"   given the vm and without other
>> changes to the source .
>> Thanks a lot and All the Best.
>>
>>
>>
>>     --- On *Wed, 10/28/09, Hind Abdul-Khaleq /<habdolkhaleq at yahoo.com>/*
>>     wrote:
>>
>>
>>         From: Hind Abdul-Khaleq <habdolkhaleq at yahoo.com>
>>         Subject: Re: [mary-dev] WikipediaProcessor: Japanese Processing
>>         Exception
>>         To: mary-dev at dfki.de
>>         Date: Wednesday, October 28, 2009, 11:45 AM
>>
>>         I'm getting this exception while processing Japanese
>>         I changed the encoding to "EUC_JP" at the line
>>
>>                     word = new String(wordBytes, "UTF8");
>>         in
>>         marytts.tools.dbselection.DBHandler.getMostFrequentWords(DBHandler.java:1366)
>>
>>         but it produced another exception at the next line:
>>                  wordList.put(word, new Integer(rs.getInt(2)));
>>         Exception in thread "main" java.lang.OutOfMemoryError: Java heap
>>         space
>>             at java.util.HashMap.resize(HashMap.java:462)
>>             at java.util.HashMap.addEntry(HashMap.java:755)
>>             at java.util.HashMap.put(HashMap.java:385)
>>             at
>>         marytts.tools.dbselection.DBHandler.getMostFrequentWords(DBHandler.java:1367)
>>             at
>>         marytts.tools.dbselection.WikipediaMarkupCleaner.updateWordList(WikipediaMarkupCleaner.java:953)
>>             at
>>         marytts.tools.dbselection.WikipediaMarkupCleaner.processWikipediaPages(WikipediaMarkupCleaner.java:1133)
>>             at
>>         marytts.tools.dbselection.WikipediaProcessor.main(WikipediaProcessor.java:368)
>>
>>
>>         also I do "-Xmx1000m",... so what to do?
>>
>>         --- On *Wed, 10/28/09, Hind Abdul-Khaleq
>>         /<habdolkhaleq at yahoo.com>/* wrote:
>>
>>
>>             From: Hind Abdul-Khaleq <habdolkhaleq at yahoo.com>
>>             Subject: [mary-dev] WikipediaProcessor: Japanese Exception
>>             To: mary-dev at dfki.de
>>             Date: Wednesday, October 28, 2009, 11:34 AM
>>
>>             Exception in thread "main" java.lang.OutOfMemoryError: GC
>>             overhead limit exceeded
>>                 at java.util.Arrays.copyOf(Arrays.java:2882)
>>                 at java.lang.StringCoding.safeTrim(StringCoding.java:75)
>>                 at java.lang.StringCoding.access$100(StringCoding.java:34)
>>                 at
>>             java.lang.StringCoding$StringDecoder.decode(StringCoding.java:151)
>>                 at java.lang.StringCoding.decode(StringCoding.java:173)
>>                 at java.lang.String.<init>(String.java:443)
>>                 at java.lang.String.<init>(String.java:515)
>>                 at
>>             marytts.tools.dbselection.DBHandler.getMostFrequentWords(DBHandler.java:1366)
>>                 at
>>             marytts.tools.dbselection.WikipediaMarkupCleaner.updateWordList(WikipediaMarkupCleaner.java:953)
>>                 at
>>             marytts.tools.dbselection.WikipediaMarkupCleaner.processWikipediaPages(WikipediaMarkupCleaner.java:1133)
>>                 at
>>             marytts.tools.dbselection.WikipediaProcessor.main(WikipediaProcessor.java:368)
>>
>>
>>
>>             -----Inline Attachment Follows-----
>>
>>             _______________________________________________
>>             Mary-dev mailing list
>>             Mary-dev at dfki.de
>>             http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>>
>>
>>
>>         -----Inline Attachment Follows-----
>>
>>         _______________________________________________
>>         Mary-dev mailing list
>>         Mary-dev at dfki.de
>>         http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Mary-dev mailing list
>> Mary-dev at dfki.de
>> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>
> --
> Dr. Marc Schröder, Senior Researcher at DFKI GmbH
> Coordinator EU FP7 Project SEMAINE http://www.semaine-project.eu
> Portal Editor http://emotion-research.net
> Team Leader DFKI Speech Group http://mary.dfki.de
>
> Homepage: http://www.dfki.de/~schroed
> Email: schroed at dfki.de
> Phone: +49-681-302-5303
> Postal address: DFKI GmbH, Campus D3_2, Stuhlsatzenhausweg 3, D-66123
> Saarbrücken, Germany
> --
> Official DFKI coordinates:
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
> Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
> Amtsgericht Kaiserslautern, HRB 2313
> _______________________________________________
> Mary-dev mailing list
> Mary-dev at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>


More information about the Mary-dev mailing list