[mary-dev] WikipediaProcessor: Japanese Processing Exception

Hind Abdul-Khaleq habdolkhaleq at yahoo.com
Tue Nov 10 04:58:07 CET 2009


well ,Although  -Xmx1750m didn't produce the exception until now ,it is not continuing -just hanged!  
Please check this issue:
http://mentormate.com/blog/javalangoutofmemoryerror-java-heap-space-unexpected-tomcat-crash-solved/

May be the key in that code snippet that it may leave those Strings' Byte arrays in memory ,so we may try to force GC or something to clean ?!
Lines: 1365 and 1366 @ 
http://mary.opendfki.de/browser/trunk/java/marytts/tools/dbselection/DBHandler.java
 while( rs.next() ) {
            wordBytes=rs.getBytes(1);
            word = new String(wordBytes, "UTF8");  
            wordList.put(word, new Integer(rs.getInt(2)));
}

And what about the "UTF8" encoding?, when trying to change it,the
program continued from the point of failure but the encoding of the
generated wordList file wasn't readable, 
also generated another
exception at the next line as in the top of the post..but 
changing "wordList.put(word, new Integer(rs.getInt(2)));" to "wordList.put(word,rs.getInt(2));"  skipped the exception, that seems we avoided creating new many objects ??!

Please check attached VisualVM screen-shots:
All   -Xmx1025m , -Xmx1250m ,  -Xmx1500m  produced the same exception ,
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
    at
 java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.<init>(String.java:443)
    at java.lang.String.<init>(String.java:515)
    at marytts.tools.dbselection.DBHandler.getMostFrequentWords(DBHandler.java:1365)
    at marytts.tools.dbselection.WikipediaMarkupCleaner.updateWordList(WikipediaMarkupCleaner.java:953)
    at marytts.tools.dbselection.WikipediaMarkupCleaner.processWikipediaPages(WikipediaMarkupCleaner.java:1133)
    at marytts.tools.dbselection.WikipediaProcessor.main(WikipediaProcessor.java:368)





--- On Mon, 11/9/09, Paulo Levi <i30817 at gmail.com> wrote:

From: Paulo Levi <i30817 at gmail.com>
Subject: Re: [mary-dev] WikipediaProcessor: Japanese Processing Exception
To: "Marc Schroeder" <schroed at dfki.de>
Cc: "Hind Abdul-Khaleq" <habdolkhaleq at yahoo.com>, mary-dev at dfki.de
Date: Monday, November 9, 2009, 8:31 AM

The oom condition doesn't necessary occur because of the (unlimited i
assume) cache, but it certainly seems a prime suspect.
On a more productive vertent, if you want you could you
 use the
jvisualvm application (on a console) to find if the memory leak comes
from there. You need the size of the heap on disk space so remember to
delete it after use.

Open jvisualvm, the application, right click on it tree node on jvvm
and enable heap dump on OOM. Then play with it a little. The most of
the mem should be bytes[] but there should be structures that hold
them (strings
 probably) held by another persistent or non limited data
structure.
I don't think that jvvm can tell where exactly in the code the data
comes from, but it should be possible to restrict things.


On Mon, Nov 9, 2009 at 7:52 AM, Marc Schroeder <schroed at dfki.de> wrote:
> Thanks for this update -- good to hear that more memory solves this
> problem. Of course it seems curious that 2GB of RAM should be required
> for running this code; if anyone would like to try and reduce the
> footprint, let me know.
>
> Best,
> Marc
>
> Hind Abdul-Khaleq schrieb:
>> The problem solved with "-Xmx2000m"   given the vm and without other
>> changes to the source .
>> Thanks a lot and All the Best.
>>
>>
>>
>>     --- On *Wed, 10/28/09, Hind
 Abdul-Khaleq /<habdolkhaleq at yahoo.com>/*
>>     wrote:
>>
>>
>>         From: Hind Abdul-Khaleq <habdolkhaleq at yahoo.com>
>>         Subject: Re: [mary-dev] WikipediaProcessor: Japanese Processing
>>         Exception
>>         To: mary-dev at dfki.de
>>         Date: Wednesday, October 28, 2009, 11:45 AM
>>
>>         I'm getting this exception while processing Japanese
>>         I changed the encoding to "EUC_JP" at the
 line
>>
>>                     word = new String(wordBytes, "UTF8");
>>         in
>>         marytts.tools.dbselection.DBHandler.getMostFrequentWords(DBHandler.java:1366)
>>
>>         but it produced another exception at the next line:
>>                  wordList.put(word, new Integer(rs.getInt(2)));
>>         Exception in thread "main" java.lang.OutOfMemoryError: Java heap
>>         space
>>             at java.util.HashMap.resize(HashMap.java:462)
>>             at java.util.HashMap.addEntry(HashMap.java:755)
>>             at
 java.util.HashMap.put(HashMap.java:385)
>>             at
>>         marytts.tools.dbselection.DBHandler.getMostFrequentWords(DBHandler.java:1367)
>>             at
>>         marytts.tools.dbselection.WikipediaMarkupCleaner.updateWordList(WikipediaMarkupCleaner.java:953)
>>             at
>>         marytts.tools.dbselection.WikipediaMarkupCleaner.processWikipediaPages(WikipediaMarkupCleaner.java:1133)
>>             at
>>         marytts.tools.dbselection.WikipediaProcessor.main(WikipediaProcessor.java:368)
>>
>>
>>         also I do "-Xmx1000m",... so what to do?
>>
>>         --- On *Wed, 10/28/09,
 Hind Abdul-Khaleq
>>         /<habdolkhaleq at yahoo.com>/* wrote:
>>
>>
>>             From: Hind Abdul-Khaleq <habdolkhaleq at yahoo.com>
>>             Subject: [mary-dev] WikipediaProcessor: Japanese Exception
>>             To: mary-dev at dfki.de
>>             Date: Wednesday, October 28, 2009, 11:34 AM
>>
>>             Exception in thread "main" java.lang.OutOfMemoryError: GC
>>            
 overhead limit exceeded
>>                 at java.util.Arrays.copyOf(Arrays.java:2882)
>>                 at java.lang.StringCoding.safeTrim(StringCoding.java:75)
>>                 at java.lang.StringCoding.access$100(StringCoding.java:34)
>>                 at
>>             java.lang.StringCoding$StringDecoder.decode(StringCoding.java:151)
>>                 at java.lang.StringCoding.decode(StringCoding.java:173)
>>                 at java.lang.String.<init>(String.java:443)
>>                 at java.lang.String.<init>(String.java:515)
>>    
             at
>>             marytts.tools.dbselection.DBHandler.getMostFrequentWords(DBHandler.java:1366)
>>                 at
>>             marytts.tools.dbselection.WikipediaMarkupCleaner.updateWordList(WikipediaMarkupCleaner.java:953)
>>                 at
>>             marytts.tools.dbselection.WikipediaMarkupCleaner.processWikipediaPages(WikipediaMarkupCleaner.java:1133)
>>                 at
>>             marytts.tools.dbselection.WikipediaProcessor.main(WikipediaProcessor.java:368)
>>
>>
>>
>>             -----Inline Attachment
 Follows-----
>>
>>             _______________________________________________
>>             Mary-dev mailing list
>>             Mary-dev at dfki.de
>>             http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>>
>>
>>
>>         -----Inline Attachment Follows-----
>>
>>         _______________________________________________
>>         Mary-dev mailing list
>>         Mary-dev at dfki.de
>>
         http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Mary-dev mailing list
>> Mary-dev at dfki.de
>> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>
> --
> Dr. Marc Schröder, Senior Researcher at DFKI GmbH
> Coordinator EU FP7 Project SEMAINE http://www.semaine-project.eu
> Portal Editor http://emotion-research.net
> Team Leader DFKI Speech Group http://mary.dfki.de
>
> Homepage: http://www.dfki.de/~schroed
> Email: schroed at dfki.de
> Phone: +49-681-302-5303
> Postal address: DFKI GmbH, Campus D3_2, Stuhlsatzenhausweg 3, D-66123
> Saarbrücken, Germany
> --
> Official DFKI coordinates:
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
> Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
> Amtsgericht Kaiserslautern, HRB 2313
>
 _______________________________________________
> Mary-dev mailing list
> Mary-dev at dfki.de
> http://www.dfki.de/mailman/cgi-bin/listinfo/mary-dev
>



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dfki.de/pipermail/mary-dev/attachments/20091109/c235a41a/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jvisualvm-screens.zip
Type: application/x-download
Size: 462013 bytes
Desc: not available
Url : http://www.dfki.de/pipermail/mary-dev/attachments/20091109/c235a41a/attachment-0001.bin 


More information about the Mary-dev mailing list