/var/tmp
   


About
Android, Linux, FLOSS etc.


Code
My code

Subscribe
Subscribe to a syndicated RSS feed of my blog.

       

Thu, 27 Aug 2009

jedit
My patch to jedit was committed, cool.

[/java/jedit] permanent link

Sat, 25 Jul 2009

As I said yesterday, I've taken 36 hours of a Java programming 101 class
and decided to see if I could put any of it to use. I believe I have.

At first I just wanted to see what a real Java program looked like. So I downloaded the latest jEdit source from Sourceforge. jEdit is the sixth most all-time active project on Sourceforge, has had millions of downloads, and is written in Java. Using ant to compile it was easy enough, and I did a cursory look through the code.

As they say, the best way to learn code is to try to change something. I looked through the bug list for open bugs that were not assigned. Bug 2808363 looked interesting so I took a look at that. As Sergey Zhumatiy states, the file he uploaded to Sourceforge does hang jEdit when one scrolls down to the line jEdit has trouble with (the line doing transliteration).

I read through the rest of the thread and repeated some of what the other posters did - I did a thread dump and got the same result as Dale Anson . Denis Dzenskevich simplified the problem by yanking all of the relevant classes, methods, regular expressions etc. and putting them into a short Java file, duplicating the problem, and I ran the program and it hung for me as well. Matthieu Casanova noted the line jEdit was choking on from the uploaded file and mentioned that the regular expression used was in the perl.xml file. Denis Dzenskevich chimed in again, noting a geometric progression in processing with a scale of 2 for every new character processed. He notes he does not know Perl but posits that perhaps the regular expression could be simplified.

The first thing I did was tried to simplify the "in the wild" code that jedit was stumbling on. I cut out extraneous lines, then I changed the file type from Perl module (pm) to Perl executable (pl), then I simplified the expression even more to where I was translating the a's in the word banana to b's (banana -> bbnbnb). A comment of a few words at the end of the transliteration line still had jEdit stumbling.

With this simple line failing, I began to suspect that Denis Dzenskevich was right with regards to the regular expression. I read Sun's information about the Pattern class, and then about the Matcher class. I read Perl documentation about transliteration and the like. I also found a very helpful Javaworld article about out-of-control regular expressions using the java.util.regex package.

I realized that the regular expression was using a greedy quantifier within the transliteration statement's second set of curly brackets. If the regular expression was going to match, this was completely pointless, so I added a question mark to the end of the quantifier, changing it to a reluctant quantifier. My Java test program (based on Denis Dzenskevich's test program) began working for my test perl files. I did an ant compile of jEdit with the new perl.xml file and suddenly jEdit was able to easily load all those test perl files it had been hanging on - it could even easily scroll through the original in the field file that had stumbled across the bug, the one Sergey Zhumatiy had uploaded to Sourceforge.

I also tested Perl files which did use backslashes improperly in the second set of brackets on transliteration lines. Still the same problem. So the bug is still there, but it has been minimized somewhat, instead of stumbling over all kinds of Perl transliteration lines, even proper ones that work, it now only stumbles over lines of Perl where transliteration is done and backslashes are used improperly in the second set of curly braces (if they're used at all - you can do transliteration with forward slashes in Perl). So my patch partially fixes the problem anyhow.

[/java/jedit] permanent link

Fri, 24 Jul 2009

I have taken 36 hours of Java programming 101 in the past few weeks, and
decided to see if I could put any of it to use.

I decided to look at how jEdit, Sourceforge's 6th most active project of all time was put together. To get the latest version from their subversion version control server I did a:

svn co https://jedit.svn.sourceforge.net/svnroot/jedit/jEdit/trunk jEdit

I then ran "ant" in the jEdit directory to build a jedit.jar file. "java -jar jedit.jar" started jEdit up. All of this is fairly simple to heavy Java users I'm sure, and I've dealt with this stuff on some level for years, but I am new to Java programming so I am learning by even doing this simple stuff.
So jEdit comes up. I browse the source tree to see how a Java project like this is put together. Of course, one of the best ways to do more hands-on learning is to dive into the code, and one of the best ways to do that is to look at recent open and unassigned bugs, so I did that.
Bug 2808363 has jEdit freezing when it tries to open a file named test.pl that consists of the following lines:

#!/usr/bin/perl
$a = "banana";
print "$a\n";
$a =~ tr{a}{b}; # I like the letter a but not b translate all
print "$a\n";

The line it chokes on is the one with the comment, in fact, if this file consisted of just that one line, jEdit still freezes. It appears jEdit will eventually parse this line, but only after many minutes (possible even hours, days etc.) So far, this is my small contribution to the bug solve, that even a file with that one line mentioned above will freeze jEdit - they were using the real-world example before, which is too unwieldly.

As someone in the bug thread points out, a thread dump (kill -3 to the process number running "java -jar jedit.jar") has AWT-EventQueue-0 showing that things are stuck in the system class Pattern. The hand-off from user-class land to system is the user-defined class TokenMarker, the handleRule method. I did this thread dump myself. As I said before, I am using the latest (as of July 23rd, 2009) code in their Subversion server.

Also as is pointed out in the bug thread, the code involved is not just in the TokenMarker.java file, but in the perl.xml file. jEdit has "modes" to edit different types of files. So that when you edit PERL, in the default jEdit PERL mode, the words if, for and my are displayed as dark blue. Scalars and arrays are green, and so on. The perl.xml file contains the following XML relevant to this -

< !-- tr/// transliteration --> <SEQ_REGEXP TYPE="MARKUP" HASH_CHAR="tr" AT_WORD_START="TRUE">tr\s*\{.*?[^\\]\}\s*\{(?:.*?[^\\])*\}[cds]*</SEQ_REGEXP>
So we have the above as the regular expression being matched against. The string being matched against is the line

$a =~ tr{a}{b}; # I like the letter a but not b translate all

We can actually remove everyting before the t in tr, but I am not sure that would still be a valid PERL statement (although it would run in PERL without a problem). So we leave it. Someone put a sample program invocating all of this on the jEdit page. I will put it here, except changing the string to my simpler one here.

import java.util.regex.Pattern;
public class FaultyRegex {
public static void main(String[] args) {
final String str = "tr{a}{b}; # I like the letter a but not b translate all";
final String regex = "tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*";
System.out.println(Pattern.compile(regex).matcher(str).lookingAt());
}
}

This program hangs for far too long. To go back to another point someone on the thread made, every character added to the end of the comment increases the execute time of the lookingAt method geometrically, with a scale factor of 2. If I run the test file:

import java.util.regex.Pattern;
public class Testy {
public static void main(String[] args) {
String str = "tr{a}{b}; # I like ";
final String regex = 
"tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*";
for (int i=0;i<1000;i++) {
long start = System.currentTimeMillis(); // start timing
System.out.println(Pattern.compile(regex).matcher(str).lookingAt());
long stop = System.currentTimeMillis(); // stop timing
System.out.println("TimeMillis: " + (stop - start)); // print execution 
System.out.println("String length is " + str.length());
System.out.println("String is " + str);
str = str + "z";
 }
}
}
I get the output:
true
TimeMillis: 17
String length is 19
String is tr{a}{b}; # I like 
true
TimeMillis: 3
String length is 20
String is tr{a}{b}; # I like z
true
TimeMillis: 9
String length is 21
String is tr{a}{b}; # I like zz
true
TimeMillis: 9
String length is 22
String is tr{a}{b}; # I like zzz
true
TimeMillis: 23
String length is 23
String is tr{a}{b}; # I like zzzz
true
TimeMillis: 40
String length is 24
String is tr{a}{b}; # I like zzzzz
true
TimeMillis: 71
String length is 25
String is tr{a}{b}; # I like zzzzzz
true
TimeMillis: 134
String length is 26
String is tr{a}{b}; # I like zzzzzzz
true
TimeMillis: 266
String length is 27
String is tr{a}{b}; # I like zzzzzzzz
true
TimeMillis: 843
String length is 28
String is tr{a}{b}; # I like zzzzzzzzz
true
TimeMillis: 1055
String length is 29
String is tr{a}{b}; # I like zzzzzzzzzz
true
TimeMillis: 2171
String length is 30
String is tr{a}{b}; # I like zzzzzzzzzzz
true
TimeMillis: 4541
String length is 31
String is tr{a}{b}; # I like zzzzzzzzzzzz
true
TimeMillis: 8796
String length is 32
String is tr{a}{b}; # I like zzzzzzzzzzzzz
true
TimeMillis: 17571
String length is 33
String is tr{a}{b}; # I like zzzzzzzzzzzzzz
true
TimeMillis: 35689
String length is 34
String is tr{a}{b}; # I like zzzzzzzzzzzzzzz
true
TimeMillis: 81209
String length is 35
String is tr{a}{b}; # I like zzzzzzzzzzzzzzzz
true
TimeMillis: 150282
String length is 36
String is tr{a}{b}; # I like zzzzzzzzzzzzzzzzz
As a line that is 42 characters long takes over 150 cpu seconds to parse, a line approaching 80 characters would take over 150,000 cpu years to parse. Which is far too long.

So where is the problem? Actually, I wanted to nail it down even more, so I did the test:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Testy {
public static void main(String[] args) {
String str = "tr{a}{b}; # I like ";
final String regex = 
"tr\\s*\\{.*?[^\\\\]\\}\\s*\\{(?:.*?[^\\\\])*\\}[cds]*";
Pattern p = Pattern.compile(regex);
for (int i=0;i<1000;i++) {
long start = System.currentTimeMillis(); // start timing
Matcher m = p.matcher(str);
long mid = System.currentTimeMillis(); // stop timing
System.out.println("TimeMillis1: " + (mid - start)); // print execution
System.out.println(m.lookingAt());
long stop = System.currentTimeMillis(); // stop timing
System.out.println("TimeMillis2: " + (stop - start)); // print execution
System.out.println("String length is " + str.length());
System.out.println("String is " + str);
str = str + "z";
 }
}
}
The tail of the output which looked like this:
TimeMillis1: 0
true
TimeMillis2: 2330
String length is 30
String is tr{a}{b}; # I like zzzzzzzzzzz
TimeMillis1: 0
true
TimeMillis2: 4454
String length is 31
String is tr{a}{b}; # I like zzzzzzzzzzzz
TimeMillis1: 0
true
TimeMillis2: 8807
String length is 32
String is tr{a}{b}; # I like zzzzzzzzzzzzz
TimeMillis1: 0
So clearly, it is Matcher's lookingAt method where it is spending all of its time.

[/java/jedit] permanent link