/var/tmp
   


About
Android, Linux, FLOSS etc.


Code
My code

Subscribe
Subscribe to a syndicated RSS feed of my blog.

       

Tue, 17 Jun 2008

/var/tmp
Well, I snapped this domain up again. I had it several years ago and lost it. Now I have it again.

One of my interests is OCR, particularly a free software OCR. I spent some time on gocr, even though none of my patches were used and the project has not been updated for over a year. GOCR seemed the best thing to contribute to when I was looking at this a year or two ago, but Google has put Tesseract and Ocropus out there so I am going to take a look at those now. They are in C++ - a language I knew nothing of two years ago, but have taken a class in so am now a little more familiar with. Apparently tesseract only does OCR, not layout. Ocropus is a layoout plugin.

I'm trying it now...it's pretty good. Better than GOCR probably.

I will attempt to improve it the same way...get a number of samples of different books from Distributed Proofreaders, match tesseract OCR to original...see if there are any patterns of failure, then fix that in the tesseract code if possible

[/ocr/gocr] permanent link