/var/tmp
   


About
Android, Linux, FLOSS etc.


Code
My code

Subscribe
Subscribe to a syndicated RSS feed of my blog.

       

Sat, 19 Dec 2009

poppler bug
I am looking at bug 436197 on the Ubuntu section of Launchpad. The bug is in the poppler library, and usually gets evoked by the evince application. I am able to duplicate it. The bug is a segmentation fault when evince tries to open certain PDF files, or tries to open certain pages in those PDF files. There are several bug duplicates since this problem has been hitting a number of people. The bug has also been reported to poppler. Launchpad has several PDF files which will reproduce the problem.

The segmentation fault happens when the TextWord constructor is called. The reason the segmentation fault happens is because the curFont object has not been created. So without doing much investigation, I simply created the curFont object if it did not exist, and then called a related method. This seemed to solve the problem, the program stopped crashing and the problem pages were displayed seemingly normally (a cursory look shows the problem pages displaying normally, but it is possible some portion of the page is displayed improperly).

git diff TextOutputDev.cc
diff --git a/poppler/TextOutputDev.cc b/poppler/TextOutputDev.cc
index 442ace2..9686cc1 100644
--- a/poppler/TextOutputDev.cc
+++ b/poppler/TextOutputDev.cc
@@ -1988,6 +1988,11 @@ void TextPage::beginWord(GfxState *state, double 
x0, double y0) {
     rot = (m[2] > 0) ? 1 : 3;
   }
 
+  if (!curFont) {
+    curFont = new TextFontInfo(state);
+    fonts->append(curFont);
+  }
+
   curWord = new TextWord(state, rot, x0, y0, charPos, curFont, 
curFontSize);
 }

However, this is really just a hack. I don't have much of an understanding of how the poppler library works or how evince works. The Poppler people point out that this segmentation fault is not tripped on pdftotext, which also uses the poppler library. This is correct, it does not seem to. Then again, evince is calling the poppler_page_render() call in the poppler library, and pdftotext does not seem to do that. Thus, what that ultimately adds up to is questionable.

Right now I am exploring the Gfx class, as backtrace (and following the program logic) shows that the Gfx class is utilized between the call to poppler_page_render() and the failed construction of the curWord object of the TextWord class. Setting the printCommands boolean to true shows debugging information so I am looking at that.

What usually happens with the above patch is that the beginWord method is called many times, with one instance where no curFont object exists (and thus a segmentation fault would happen). I do not know much about the evince code or these libraries, so I am looking into all of this, seeing if I can come up with anything better than the above hack. It is pretty clear this is a poppler problem though - even if these pdf's are messed up, they don't crash PDF displayers that don't use the poppler library. The same goes for if evince is not doing something right with Cairo before handing it off to poppler. If this is happening 12 calls within poppler, it points to poppler being the problem.

[/poppler] permanent link