Hi. In replacing stringbuffers by stringbuilders in the code base i noticed the class marytts.tools.dbselection.WikipediaMarkupCleaner.<br><br>This class has a very poor use of stringbuffer & string with lines like this:<br>
line = new StringBuffer(line.toString().replaceAll("<p>", ""));<br>line = new StringBuffer(line.toString().replaceAll("</p>", ""));<br>...<br>(the string buffer does nothing, the string to string could be avoided by making the line variable a string)<br>
or<br><br> if( ( line.toString().startsWith("*") || <br> line.toString().startsWith("#") ||<br> line.toString().startsWith(";") ||<br> line.toString().startsWith(".") ||<br>
....<br>(to string could be avoided by making line a string)<br><br>I can remove most of this easily enough, however the whole class seems to do xml preprocessing. For this there are technologies that do a one pass not in memory scanning like sax (if don't need control - callback) or StAX if need control.<br>
For stax i could use <a href="http://woodstox.codehaus.org/">http://woodstox.codehaus.org/</a> if i wanted access to more than "normal" stax otherwise i think the jdk implementation is fine, but first i need to know what xml entities are needed exactly from wikipedia.<br>
<br>