/var/tmp | |||||
Subscribe
|
Thu, 27 Aug 2009
jedit
Sun, 26 Jul 2009
As I said yesterday, I've taken 36 hours of a Java programming 101 class
At first I just wanted to see what a real Java program looked like. So I downloaded the latest jEdit source from Sourceforge. jEdit is the sixth most all-time active project on Sourceforge, has had millions of downloads, and is written in Java. Using ant to compile it was easy enough, and I did a cursory look through the code. As they say, the best way to learn code is to try to change something. I looked through the bug list for open bugs that were not assigned. Bug 2808363 looked interesting so I took a look at that. As Sergey Zhumatiy states, the file he uploaded to Sourceforge does hang jEdit when one scrolls down to the line jEdit has trouble with (the line doing transliteration). I read through the rest of the thread and repeated some of what the other posters did - I did a thread dump and got the same result as Dale Anson . Denis Dzenskevich simplified the problem by yanking all of the relevant classes, methods, regular expressions etc. and putting them into a short Java file, duplicating the problem, and I ran the program and it hung for me as well. Matthieu Casanova noted the line jEdit was choking on from the uploaded file and mentioned that the regular expression used was in the perl.xml file. Denis Dzenskevich chimed in again, noting a geometric progression in processing with a scale of 2 for every new character processed. He notes he does not know Perl but posits that perhaps the regular expression could be simplified. The first thing I did was tried to simplify the "in the wild" code that jedit was stumbling on. I cut out extraneous lines, then I changed the file type from Perl module (pm) to Perl executable (pl), then I simplified the expression even more to where I was translating the a's in the word banana to b's (banana -> bbnbnb). A comment of a few words at the end of the transliteration line still had jEdit stumbling. With this simple line failing, I began to suspect that Denis Dzenskevich was right with regards to the regular expression. I read Sun's information about the Pattern class, and then about the Matcher class. I read Perl documentation about transliteration and the like. I also found a very helpful Javaworld article about out-of-control regular expressions using the java.util.regex package. I realized that the regular expression was using a greedy quantifier within the transliteration statement's second set of curly brackets. If the regular expression was going to match, this was completely pointless, so I added a question mark to the end of the quantifier, changing it to a reluctant quantifier. My Java test program (based on Denis Dzenskevich's test program) began working for my test perl files. I did an ant compile of jEdit with the new perl.xml file and suddenly jEdit was able to easily load all those test perl files it had been hanging on - it could even easily scroll through the original in the field file that had stumbled across the bug, the one Sergey Zhumatiy had uploaded to Sourceforge. I also tested Perl files which did use backslashes improperly in the second set of brackets on transliteration lines. Still the same problem. So the bug is still there, but it has been minimized somewhat, instead of stumbling over all kinds of Perl transliteration lines, even proper ones that work, it now only stumbles over lines of Perl where transliteration is done and backslashes are used improperly in the second set of curly braces (if they're used at all - you can do transliteration with forward slashes in Perl). So my patch partially fixes the problem anyhow. Fri, 24 Jul 2009
I have taken 36 hours of Java programming 101 in the past few weeks, and
I decided to look at how jEdit, Sourceforge's 6th most active project of all time was put together. To get the latest version from their subversion version control server I did a: svn co https://jedit.svn.sourceforge.net/svnroot/jedit/jEdit/trunk jEdit
I then ran "ant" in the jEdit directory to build a jedit.jar file.
"java -jar jedit.jar" started jEdit up. All of this is fairly simple to
heavy Java users I'm sure, and I've dealt with this stuff on some level
for years, but I am new to Java programming so I am learning by even
doing this simple stuff.
#!/usr/bin/perl The line it chokes on is the one with the comment, in fact, if this file consisted of just that one line, jEdit still freezes. It appears jEdit will eventually parse this line, but only after many minutes (possible even hours, days etc.) So far, this is my small contribution to the bug solve, that even a file with that one line mentioned above will freeze jEdit - they were using the real-world example before, which is too unwieldly. As someone in the bug thread points out, a thread dump (kill -3 to the process number running "java -jar jedit.jar") has AWT-EventQueue-0 showing that things are stuck in the system class Pattern. The hand-off from user-class land to system is the user-defined class TokenMarker, the handleRule method. I did this thread dump myself. As I said before, I am using the latest (as of July 23rd, 2009) code in their Subversion server.
Also as is pointed out in the bug thread, the code involved is not just
in the TokenMarker.java file, but in the perl.xml file. jEdit has
"modes" to edit different types of files. So that when you edit PERL,
in the default jEdit PERL mode, the words if, for and my are displayed
as dark blue. Scalars and arrays are green, and so on. The perl.xml
file contains the following XML relevant to this - So we have the above as the regular expression being matched against. The string being matched against is the line $a =~ tr{a}{b}; # I like the letter a but not b translate all We can actually remove everyting before the t in tr, but I am not sure that would still be a valid PERL statement (although it would run in PERL without a problem). So we leave it. Someone put a sample program invocating all of this on the jEdit page. I will put it here, except changing the string to my simpler one here.
import java.util.regex.Pattern; This program hangs for far too long. To go back to another point someone on the thread made, every character added to the end of the comment increases the execute time of the lookingAt method geometrically, with a scale factor of 2. If I run the test file:
import java.util.regex.Pattern; public class Testy { public static void main(String[] args) { String str = "tr{a}{b}; # I like "; final String regex = "tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*"; for (int i=0;i<1000;i++) { long start = System.currentTimeMillis(); // start timing System.out.println(Pattern.compile(regex).matcher(str).lookingAt()); long stop = System.currentTimeMillis(); // stop timing System.out.println("TimeMillis: " + (stop - start)); // print execution System.out.println("String length is " + str.length()); System.out.println("String is " + str); str = str + "z"; } } }I get the output: true TimeMillis: 17 String length is 19 String is tr{a}{b}; # I like true TimeMillis: 3 String length is 20 String is tr{a}{b}; # I like z true TimeMillis: 9 String length is 21 String is tr{a}{b}; # I like zz true TimeMillis: 9 String length is 22 String is tr{a}{b}; # I like zzz true TimeMillis: 23 String length is 23 String is tr{a}{b}; # I like zzzz true TimeMillis: 40 String length is 24 String is tr{a}{b}; # I like zzzzz true TimeMillis: 71 String length is 25 String is tr{a}{b}; # I like zzzzzz true TimeMillis: 134 String length is 26 String is tr{a}{b}; # I like zzzzzzz true TimeMillis: 266 String length is 27 String is tr{a}{b}; # I like zzzzzzzz true TimeMillis: 843 String length is 28 String is tr{a}{b}; # I like zzzzzzzzz true TimeMillis: 1055 String length is 29 String is tr{a}{b}; # I like zzzzzzzzzz true TimeMillis: 2171 String length is 30 String is tr{a}{b}; # I like zzzzzzzzzzz true TimeMillis: 4541 String length is 31 String is tr{a}{b}; # I like zzzzzzzzzzzz true TimeMillis: 8796 String length is 32 String is tr{a}{b}; # I like zzzzzzzzzzzzz true TimeMillis: 17571 String length is 33 String is tr{a}{b}; # I like zzzzzzzzzzzzzz true TimeMillis: 35689 String length is 34 String is tr{a}{b}; # I like zzzzzzzzzzzzzzz true TimeMillis: 81209 String length is 35 String is tr{a}{b}; # I like zzzzzzzzzzzzzzzz true TimeMillis: 150282 String length is 36 String is tr{a}{b}; # I like zzzzzzzzzzzzzzzzzAs a line that is 42 characters long takes over 150 cpu seconds to parse, a line approaching 80 characters would take over 150,000 cpu years to parse. Which is far too long. So where is the problem? Actually, I wanted to nail it down even more, so I did the test: import java.util.regex.Pattern; import java.util.regex.Matcher; public class Testy { public static void main(String[] args) { String str = "tr{a}{b}; # I like "; final String regex = "tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*"; Pattern p = Pattern.compile(regex); for (int i=0;i<1000;i++) { long start = System.currentTimeMillis(); // start timing Matcher m = p.matcher(str); long mid = System.currentTimeMillis(); // stop timing System.out.println("TimeMillis1: " + (mid - start)); // print execution System.out.println(m.lookingAt()); long stop = System.currentTimeMillis(); // stop timing System.out.println("TimeMillis2: " + (stop - start)); // print execution System.out.println("String length is " + str.length()); System.out.println("String is " + str); str = str + "z"; } } }The tail of the output which looked like this: TimeMillis1: 0 true TimeMillis2: 2330 String length is 30 String is tr{a}{b}; # I like zzzzzzzzzzz TimeMillis1: 0 true TimeMillis2: 4454 String length is 31 String is tr{a}{b}; # I like zzzzzzzzzzzz TimeMillis1: 0 true TimeMillis2: 8807 String length is 32 String is tr{a}{b}; # I like zzzzzzzzzzzzz TimeMillis1: 0So clearly, it is Matcher's lookingAt method where it is spending all of its time. |
||||