/var/tmp
   


About
Android, Linux, FLOSS etc.


Code
My code

Subscribe
Subscribe to a syndicated RSS feed of my blog.

       

Fri, 24 Jul 2009

I have taken 36 hours of Java programming 101 in the past few weeks, and
decided to see if I could put any of it to use.

I decided to look at how jEdit, Sourceforge's 6th most active project of all time was put together. To get the latest version from their subversion version control server I did a:

svn co https://jedit.svn.sourceforge.net/svnroot/jedit/jEdit/trunk jEdit

I then ran "ant" in the jEdit directory to build a jedit.jar file. "java -jar jedit.jar" started jEdit up. All of this is fairly simple to heavy Java users I'm sure, and I've dealt with this stuff on some level for years, but I am new to Java programming so I am learning by even doing this simple stuff.
So jEdit comes up. I browse the source tree to see how a Java project like this is put together. Of course, one of the best ways to do more hands-on learning is to dive into the code, and one of the best ways to do that is to look at recent open and unassigned bugs, so I did that.
Bug 2808363 has jEdit freezing when it tries to open a file named test.pl that consists of the following lines:

#!/usr/bin/perl
$a = "banana";
print "$a\n";
$a =~ tr{a}{b}; # I like the letter a but not b translate all
print "$a\n";

The line it chokes on is the one with the comment, in fact, if this file consisted of just that one line, jEdit still freezes. It appears jEdit will eventually parse this line, but only after many minutes (possible even hours, days etc.) So far, this is my small contribution to the bug solve, that even a file with that one line mentioned above will freeze jEdit - they were using the real-world example before, which is too unwieldly.

As someone in the bug thread points out, a thread dump (kill -3 to the process number running "java -jar jedit.jar") has AWT-EventQueue-0 showing that things are stuck in the system class Pattern. The hand-off from user-class land to system is the user-defined class TokenMarker, the handleRule method. I did this thread dump myself. As I said before, I am using the latest (as of July 23rd, 2009) code in their Subversion server.

Also as is pointed out in the bug thread, the code involved is not just in the TokenMarker.java file, but in the perl.xml file. jEdit has "modes" to edit different types of files. So that when you edit PERL, in the default jEdit PERL mode, the words if, for and my are displayed as dark blue. Scalars and arrays are green, and so on. The perl.xml file contains the following XML relevant to this -

< !-- tr/// transliteration --> <SEQ_REGEXP TYPE="MARKUP" HASH_CHAR="tr" AT_WORD_START="TRUE">tr\s*\{.*?[^\\]\}\s*\{(?:.*?[^\\])*\}[cds]*</SEQ_REGEXP>
So we have the above as the regular expression being matched against. The string being matched against is the line

$a =~ tr{a}{b}; # I like the letter a but not b translate all

We can actually remove everyting before the t in tr, but I am not sure that would still be a valid PERL statement (although it would run in PERL without a problem). So we leave it. Someone put a sample program invocating all of this on the jEdit page. I will put it here, except changing the string to my simpler one here.

import java.util.regex.Pattern;
public class FaultyRegex {
public static void main(String[] args) {
final String str = "tr{a}{b}; # I like the letter a but not b translate all";
final String regex = "tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*";
System.out.println(Pattern.compile(regex).matcher(str).lookingAt());
}
}

This program hangs for far too long. To go back to another point someone on the thread made, every character added to the end of the comment increases the execute time of the lookingAt method geometrically, with a scale factor of 2. If I run the test file:

import java.util.regex.Pattern;
public class Testy {
public static void main(String[] args) {
String str = "tr{a}{b}; # I like ";
final String regex = 
"tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*";
for (int i=0;i<1000;i++) {
long start = System.currentTimeMillis(); // start timing
System.out.println(Pattern.compile(regex).matcher(str).lookingAt());
long stop = System.currentTimeMillis(); // stop timing
System.out.println("TimeMillis: " + (stop - start)); // print execution 
System.out.println("String length is " + str.length());
System.out.println("String is " + str);
str = str + "z";
 }
}
}
I get the output:
true
TimeMillis: 17
String length is 19
String is tr{a}{b}; # I like 
true
TimeMillis: 3
String length is 20
String is tr{a}{b}; # I like z
true
TimeMillis: 9
String length is 21
String is tr{a}{b}; # I like zz
true
TimeMillis: 9
String length is 22
String is tr{a}{b}; # I like zzz
true
TimeMillis: 23
String length is 23
String is tr{a}{b}; # I like zzzz
true
TimeMillis: 40
String length is 24
String is tr{a}{b}; # I like zzzzz
true
TimeMillis: 71
String length is 25
String is tr{a}{b}; # I like zzzzzz
true
TimeMillis: 134
String length is 26
String is tr{a}{b}; # I like zzzzzzz
true
TimeMillis: 266
String length is 27
String is tr{a}{b}; # I like zzzzzzzz
true
TimeMillis: 843
String length is 28
String is tr{a}{b}; # I like zzzzzzzzz
true
TimeMillis: 1055
String length is 29
String is tr{a}{b}; # I like zzzzzzzzzz
true
TimeMillis: 2171
String length is 30
String is tr{a}{b}; # I like zzzzzzzzzzz
true
TimeMillis: 4541
String length is 31
String is tr{a}{b}; # I like zzzzzzzzzzzz
true
TimeMillis: 8796
String length is 32
String is tr{a}{b}; # I like zzzzzzzzzzzzz
true
TimeMillis: 17571
String length is 33
String is tr{a}{b}; # I like zzzzzzzzzzzzzz
true
TimeMillis: 35689
String length is 34
String is tr{a}{b}; # I like zzzzzzzzzzzzzzz
true
TimeMillis: 81209
String length is 35
String is tr{a}{b}; # I like zzzzzzzzzzzzzzzz
true
TimeMillis: 150282
String length is 36
String is tr{a}{b}; # I like zzzzzzzzzzzzzzzzz
As a line that is 42 characters long takes over 150 cpu seconds to parse, a line approaching 80 characters would take over 150,000 cpu years to parse. Which is far too long.

So where is the problem? Actually, I wanted to nail it down even more, so I did the test:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Testy {
public static void main(String[] args) {
String str = "tr{a}{b}; # I like ";
final String regex = 
"tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*";
Pattern p = Pattern.compile(regex);
for (int i=0;i<1000;i++) {
long start = System.currentTimeMillis(); // start timing
Matcher m = p.matcher(str);
long mid = System.currentTimeMillis(); // stop timing
System.out.println("TimeMillis1: " + (mid - start)); // print execution
System.out.println(m.lookingAt());
long stop = System.currentTimeMillis(); // stop timing
System.out.println("TimeMillis2: " + (stop - start)); // print execution
System.out.println("String length is " + str.length());
System.out.println("String is " + str);
str = str + "z";
 }
}
}
The tail of the output which looked like this:
TimeMillis1: 0
true
TimeMillis2: 2330
String length is 30
String is tr{a}{b}; # I like zzzzzzzzzzz
TimeMillis1: 0
true
TimeMillis2: 4454
String length is 31
String is tr{a}{b}; # I like zzzzzzzzzzzz
TimeMillis1: 0
true
TimeMillis2: 8807
String length is 32
String is tr{a}{b}; # I like zzzzzzzzzzzzz
TimeMillis1: 0
So clearly, it is Matcher's lookingAt method where it is spending all of its time.

[/java/jedit] permanent link