Bug 315031 - Indexer crash on faulty PDF
Summary: Indexer crash on faulty PDF
Status: RESOLVED UPSTREAM
Alias: None
Product: nepomuk
Classification: Miscellaneous
Component: fileindexer (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR crash
Target Milestone: ---
Assignee: Nepomuk Bugs Coordination
URL:
Keywords:
: 315732 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-02-13 01:58 UTC by lnxusr
Modified: 2013-02-25 01:07 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Backtrace (2.07 KB, text/plain)
2013-02-18 09:22 UTC, Mathias Dietrich
Details

Note You need to log in before you can comment on or make changes to this bug.
Description lnxusr 2013-02-13 01:58:19 UTC
Application: nepomukindexer (0.1.0)
KDE Platform Version: 4.10.00
Qt Version: 4.8.3
Operating System: Linux 3.5.0-23-generic x86_64
Distribution: Ubuntu 12.10

-- Information about the crash:
NepomukIndexer has been periodically seg faulting since my update to 4.10 from 4.9.97 earlier this morning.  I've made no changes to the configuration since the upgrade.

The crash can be reproduced some of the time.

-- Backtrace:
Application: NepomukIndexer (nepomukindexer), signal: Segmentation fault
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[KCrash Handler]
#5  beginWord (y0=<optimized out>, x0=<optimized out>, state=0x168cdf0, this=0x1665130) at TextOutputDev.cc:2273
#6  TextPage::beginWord (this=0x1665130, state=0x168cdf0, x0=<optimized out>, y0=<optimized out>) at TextOutputDev.cc:2237
#7  0x00007fba1560d675 in TextPage::addChar (this=0x1665130, state=0x168cdf0, x=<optimized out>, y=<optimized out>, dx=<optimized out>, dy=<optimized out>, c=0, nBytes=1, u=0x171d940, uLen=1) at TextOutputDev.cc:2373
#8  0x00007fba156144cc in ActualText::end (this=0x1697a50, state=0x168cdf0) at TextOutputDev.cc:5276
#9  0x00007fba1559d1d1 in Gfx::opEndMarkedContent (this=0x171c0d0, args=<optimized out>, numArgs=<optimized out>) at Gfx.cc:5088
#10 0x00007fba1559e9a4 in Gfx::go (this=this@entry=0x171c0d0, topLevel=topLevel@entry=true) at Gfx.cc:716
#11 0x00007fba1559ee10 in Gfx::display (this=0x171c0d0, obj=<optimized out>, topLevel=<optimized out>) at Gfx.cc:682
#12 0x00007fba155df1d4 in Page::displaySlice (this=0x16cda20, out=0x16169b0, hDPI=<optimized out>, vDPI=<optimized out>, rotate=0, useMediaBox=208, crop=<optimized out>, sliceX=-1, sliceY=-1, sliceW=-1, sliceH=-1, printing=false, abortCheckCbk=0x0, abortCheckCbkData=0x0, annotDisplayDecideCbk=0x0, annotDisplayDecideCbkData=0x0) at Page.cc:519
#13 0x00007fba1591fa6b in Poppler::Page::text (this=0x15b4700, r=..., textLayout=textLayout@entry=Poppler::Page::PhysicalLayout) at poppler-page.cc:354
#14 0x00007fba1591fb6b in Poppler::Page::text (this=<optimized out>, r=...) at poppler-page.cc:374
#15 0x00007fba15b8042e in Nepomuk2::PopplerExtractor::extract (this=<optimized out>, resUri=..., fileUrl=..., mimeType=...) at ../../../../services/fileindexer/indexer/popplerextractor.cpp:104
#16 0x000000000040a412 in Nepomuk2::Indexer::fileIndex (this=this@entry=0x7fff085ee890, uri=..., url=..., mimeType=...) at ../../../../services/fileindexer/indexer/indexer.cpp:146
#17 0x000000000040af40 in Nepomuk2::Indexer::indexFile (this=0x7fff085ee890, url=...) at ../../../../services/fileindexer/indexer/indexer.cpp:101
#18 0x000000000040840e in main (argc=2, argv=0x7fff085ee9f8) at ../../../../services/fileindexer/indexer/main.cpp:113

Reported using DrKonqi
Comment 1 lnxusr 2013-02-13 05:00:01 UTC
I discovered this was happening while attempting to index one specific .pdf on an NFS share.  I moved the file to a local directory the segfault stopped and it indexed the file.  I moved the file back to the original NFS directory and it indexed the file without segfaulting.

Possibly close this as a fluke?
Comment 2 Vishesh Handa 2013-02-13 09:05:24 UTC
Definitely not a fluke. Do you think you could possibly upload that pdf file? The poppler indexer seems to be crashing.
Comment 3 lnxusr 2013-02-13 15:12:01 UTC
Sorry Vishesh, It's a .pdf of a copyrighted book that was scanned.  Not sure I can do that.  Is it trying to go into the .pdf to index the contents?  It being a scanned book, I'd assume the pages are jpeg or png files.
Comment 4 Vishesh Handa 2013-02-14 08:24:18 UTC
Would it be possible for you to maybe split the pdf into a number of different pages? Maybe it is just one of the pages. ( http://stackoverflow.com/questions/10228592/splitting-a-pdf-with-ghostscript )

You can run the $ nepomukindexer <fileName> on each of those pages to see if it produces a crash.

Also, does this file open in okular? Cause Okular also uses QtPoppler to render the file.
Comment 5 lnxusr 2013-02-14 19:15:31 UTC
(In reply to comment #4)
> Would it be possible for you to maybe split the pdf into a number of
> different pages? Maybe it is just one of the pages. ( http://stackoverflow.com/questions/10228592/splitting-a-pdf-with-ghostscript )
> 
> You can run the $ nepomukindexer <fileName> on each of those pages to see if
> it produces a crash.

I split each page into an individual .pdf and wrote a script to index each with nepomukindexer.  One page gives an "Error (197): Command token too long" but Nepomuk doesn't crash.  It always crashes on the original file when I run nepomukindexer against it.

I rebuilt the file omitting the offending page and nepomukindexer indexes it without crashing and without the error.  I then put the page back in and, again, nepomukindexer indexes it without crashing and without the error.  Just to verify ghostscript didn't modify anything, I split the page back out of the new file. Running nepomukindexer on the newly extracted page again results in no crash and with the error mentioned above.

> Also, does this file open in okular? Cause Okular also uses QtPoppler to render the file.

Okular displays the file just fine with no errors or warnings, however ghostscript does give the following warnings on the original file only:

   **** Warning:  File has an invalid xref entry:  13533.  Rebuilding xref table.
   **** Warning:  There are objects with matching object and generation
   **** numbers.  The accuracy of the resulting image is unknown.
Comment 6 Mathias Dietrich 2013-02-17 15:33:00 UTC
For me, Nepomuk crashes with exact the same backtrace with the difference, that the crash is not related to indexing. In my case the indexing is already finished and everytime when my display gets dark because I am away from keyboard I get a the Nepomuk crasher. 

If you need further info or if I should file a new bug report, please contact me.
Comment 7 Mathias Dietrich 2013-02-17 15:52:49 UTC
Relooked into the issue. It seems that Nepomuk try to index a pdf, when the PC is in idle and this PDF (>500 page) document causes the crasher.

Interestingly, this pdf is rendered fine by poppler and this pdf was indexed fine by Nepomuk, before 4.10.
Comment 8 Vishesh Handa 2013-02-17 17:22:11 UTC
(In reply to comment #7)
> Relooked into the issue. It seems that Nepomuk try to index a pdf, when the
> PC is in idle and this PDF (>500 page) document causes the crasher.
> 
> Interestingly, this pdf is rendered fine by poppler and this pdf was indexed
> fine by Nepomuk, before 4.10.

Please provide the backtrace of running the nepomukindexer on that file. Also, it would be awesome if you could provide me the file either privately, or upload it on bugzilla.

The entire Nepomuk indexing architecture has changed considerably over the course of 4.10.
Comment 9 Mathias Dietrich 2013-02-18 09:22:08 UTC
Created attachment 77400 [details]
Backtrace

Here is the backtrace of the crasher. I will send you the file via email.
Comment 10 Vishesh Handa 2013-02-18 12:31:41 UTC
Hmm. So both of you have the same backtrace, and the file indexes fine for me. Could you poppler and poppler-qt you have installed?

Mine is -
extra/poppler 0.22.1-1 [installed]
    PDF rendering library based on xpdf 3.0
extra/poppler-qt 0.22.1-1 [installed]
    Poppler Qt bindings
Comment 11 Mathias Dietrich 2013-02-18 13:52:46 UTC
Seems like its a poppler issue. I get a crasher with Okular on page 366 (document page 397) using poppler(-qt) 0.20.4.
Comment 12 Vishesh Handa 2013-02-18 15:57:44 UTC
Could you please try updating poppler?
Comment 13 lnxusr 2013-02-18 17:45:03 UTC
I have the same version of poppler and poppler-qt as TheGhost - 0.20.4.  Unfortunately, this seems to be the latest available for (K)Ubuntu 12.10.

I can't say as TheGhost has, that this file indexed fine prior to my 4.10 update.   I wasn't using the Nepomuk indexing.  I turned it on after the update to see if the memory/system resource problems of past have been ironed out.  Everything seems fine other than this one file, so I may continue using it from this point on.
Comment 14 Vishesh Handa 2013-02-18 17:53:21 UTC
Cool. Marking this as RESOLVED -> UPSTREAM.

@Rohan: Please update the poppler packages for kubuntu.
Comment 15 Rohan Garg 2013-02-18 17:54:16 UTC
Acknowledged. I'll update this tomorrow.
Comment 16 lnxusr 2013-02-18 20:35:37 UTC
Updated to poppler 0.22 and nepomukindexer no longer crashes indexing that file.

For anyone on Ubuntu that wishes to upgrade, I used the build files from Matthieu Baerts made for Raring for poppler 0.22. I grabbed the poppler_0.22.0-0ubuntu0~matttbe2.debian.tar.gz, poppler_0.22.0-0ubuntu0~matttbe2.dsc and poppler_0.22.0.orig.tar.gz files from https://launchpad.net/~matttbe/+archive/ppa/+sourcepub/2903898/+listing-archive-extra.

For poppler-data, I used the Raring packages by Hideki Yamane. I grabbed all four files from https://launchpad.net/ubuntu/+source/poppler-data/0.4.6-2.

To build, just uncompress the source files, then uncompress the .debian.tar.gz into there and place the .dsc file there as well.  Run 'dpkg-buildpackage -rfakeroot -uc -b' form within the source directory and it will build the source and place a .deb file one directory up.

I'm not sure what the ai0 files are, but they're included in the original poppler 0.20 for Ubuntu, so I made sure to uncompress that into the poppler-data source directory as well.

To make sure you have the required dependences to build poppler, use sudo apt-get build-dep poppler to install them.  Poppler-data has no build dependencies.

Thank you Vishesh.
Comment 17 Jekyll Wu 2013-02-25 01:07:38 UTC
*** Bug 315732 has been marked as a duplicate of this bug. ***