/var/tmp | |||||
Subscribe
|
Fri, 24 Jul 2009
I have taken 36 hours of Java programming 101 in the past few weeks, and
I decided to look at how jEdit, Sourceforge's 6th most active project of all time was put together. To get the latest version from their subversion version control server I did a: svn co https://jedit.svn.sourceforge.net/svnroot/jedit/jEdit/trunk jEdit
I then ran "ant" in the jEdit directory to build a jedit.jar file.
"java -jar jedit.jar" started jEdit up. All of this is fairly simple to
heavy Java users I'm sure, and I've dealt with this stuff on some level
for years, but I am new to Java programming so I am learning by even
doing this simple stuff.
#!/usr/bin/perl The line it chokes on is the one with the comment, in fact, if this file consisted of just that one line, jEdit still freezes. It appears jEdit will eventually parse this line, but only after many minutes (possible even hours, days etc.) So far, this is my small contribution to the bug solve, that even a file with that one line mentioned above will freeze jEdit - they were using the real-world example before, which is too unwieldly. As someone in the bug thread points out, a thread dump (kill -3 to the process number running "java -jar jedit.jar") has AWT-EventQueue-0 showing that things are stuck in the system class Pattern. The hand-off from user-class land to system is the user-defined class TokenMarker, the handleRule method. I did this thread dump myself. As I said before, I am using the latest (as of July 23rd, 2009) code in their Subversion server.
Also as is pointed out in the bug thread, the code involved is not just
in the TokenMarker.java file, but in the perl.xml file. jEdit has
"modes" to edit different types of files. So that when you edit PERL,
in the default jEdit PERL mode, the words if, for and my are displayed
as dark blue. Scalars and arrays are green, and so on. The perl.xml
file contains the following XML relevant to this - So we have the above as the regular expression being matched against. The string being matched against is the line $a =~ tr{a}{b}; # I like the letter a but not b translate all We can actually remove everyting before the t in tr, but I am not sure that would still be a valid PERL statement (although it would run in PERL without a problem). So we leave it. Someone put a sample program invocating all of this on the jEdit page. I will put it here, except changing the string to my simpler one here.
import java.util.regex.Pattern; This program hangs for far too long. To go back to another point someone on the thread made, every character added to the end of the comment increases the execute time of the lookingAt method geometrically, with a scale factor of 2. If I run the test file:
import java.util.regex.Pattern; public class Testy { public static void main(String[] args) { String str = "tr{a}{b}; # I like "; final String regex = "tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*"; for (int i=0;i<1000;i++) { long start = System.currentTimeMillis(); // start timing System.out.println(Pattern.compile(regex).matcher(str).lookingAt()); long stop = System.currentTimeMillis(); // stop timing System.out.println("TimeMillis: " + (stop - start)); // print execution System.out.println("String length is " + str.length()); System.out.println("String is " + str); str = str + "z"; } } }I get the output: true TimeMillis: 17 String length is 19 String is tr{a}{b}; # I like true TimeMillis: 3 String length is 20 String is tr{a}{b}; # I like z true TimeMillis: 9 String length is 21 String is tr{a}{b}; # I like zz true TimeMillis: 9 String length is 22 String is tr{a}{b}; # I like zzz true TimeMillis: 23 String length is 23 String is tr{a}{b}; # I like zzzz true TimeMillis: 40 String length is 24 String is tr{a}{b}; # I like zzzzz true TimeMillis: 71 String length is 25 String is tr{a}{b}; # I like zzzzzz true TimeMillis: 134 String length is 26 String is tr{a}{b}; # I like zzzzzzz true TimeMillis: 266 String length is 27 String is tr{a}{b}; # I like zzzzzzzz true TimeMillis: 843 String length is 28 String is tr{a}{b}; # I like zzzzzzzzz true TimeMillis: 1055 String length is 29 String is tr{a}{b}; # I like zzzzzzzzzz true TimeMillis: 2171 String length is 30 String is tr{a}{b}; # I like zzzzzzzzzzz true TimeMillis: 4541 String length is 31 String is tr{a}{b}; # I like zzzzzzzzzzzz true TimeMillis: 8796 String length is 32 String is tr{a}{b}; # I like zzzzzzzzzzzzz true TimeMillis: 17571 String length is 33 String is tr{a}{b}; # I like zzzzzzzzzzzzzz true TimeMillis: 35689 String length is 34 String is tr{a}{b}; # I like zzzzzzzzzzzzzzz true TimeMillis: 81209 String length is 35 String is tr{a}{b}; # I like zzzzzzzzzzzzzzzz true TimeMillis: 150282 String length is 36 String is tr{a}{b}; # I like zzzzzzzzzzzzzzzzzAs a line that is 42 characters long takes over 150 cpu seconds to parse, a line approaching 80 characters would take over 150,000 cpu years to parse. Which is far too long. So where is the problem? Actually, I wanted to nail it down even more, so I did the test: import java.util.regex.Pattern; import java.util.regex.Matcher; public class Testy { public static void main(String[] args) { String str = "tr{a}{b}; # I like "; final String regex = "tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*"; Pattern p = Pattern.compile(regex); for (int i=0;i<1000;i++) { long start = System.currentTimeMillis(); // start timing Matcher m = p.matcher(str); long mid = System.currentTimeMillis(); // stop timing System.out.println("TimeMillis1: " + (mid - start)); // print execution System.out.println(m.lookingAt()); long stop = System.currentTimeMillis(); // stop timing System.out.println("TimeMillis2: " + (stop - start)); // print execution System.out.println("String length is " + str.length()); System.out.println("String is " + str); str = str + "z"; } } }The tail of the output which looked like this: TimeMillis1: 0 true TimeMillis2: 2330 String length is 30 String is tr{a}{b}; # I like zzzzzzzzzzz TimeMillis1: 0 true TimeMillis2: 4454 String length is 31 String is tr{a}{b}; # I like zzzzzzzzzzzz TimeMillis1: 0 true TimeMillis2: 8807 String length is 32 String is tr{a}{b}; # I like zzzzzzzzzzzzz TimeMillis1: 0So clearly, it is Matcher's lookingAt method where it is spending all of its time. |
||||