/var/tmp | |||||
Subscribe
|
Tue, 17 Jun 2008
/var/tmp
One of my interests is OCR, particularly a free software OCR. I spent some time on gocr, even though none of my patches were used and the project has not been updated for over a year. GOCR seemed the best thing to contribute to when I was looking at this a year or two ago, but Google has put Tesseract and Ocropus out there so I am going to take a look at those now. They are in C++ - a language I knew nothing of two years ago, but have taken a class in so am now a little more familiar with. Apparently tesseract only does OCR, not layout. Ocropus is a layoout plugin. I'm trying it now...it's pretty good. Better than GOCR probably. I will attempt to improve it the same way...get a number of samples of different books from Distributed Proofreaders, match tesseract OCR to original...see if there are any patterns of failure, then fix that in the tesseract code if possible |
||||