Bug 312569

Summary:	Crash opening a PDF file
Product:	[Unmaintained] nepomuk	Reporter:	Alejandro Nova <alejandronova>
Component:	fileindexer	Assignee:	Jörg Ehrichs <Joerg.Ehrichs>
Status:	RESOLVED FIXED
Severity:	crash	CC:	alinm.elena, Hamburger1984, jlmonge, Joerg.Ehrichs, kamikazow, kde, me, mustafa1024m, nepomuk-bugs, paul, robby.engelmann
Priority:	NOR
Version First Reported In:	4.9.95 RC1
Target Milestone:	---
Platform:	unspecified
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description Alejandro Nova 2013-01-03 22:27:04 UTC

Application: nepomukindexer (0.1.0)
KDE Platform Version: 4.9.95 (Compiled from sources)
Qt Version: 4.8.4
Operating System: Linux 3.6.6-1-CHAKRA x86_64
Distribution: "Chakra Linux"

-- Information about the crash:
While trying to index a specific PDF file (compendioDeLaRiquezadeLasNaciones.pdf), I get this crash and the system tries over and over to index the file.

KDE 4.10 RC1-2 (Chakra RC1 with soprano, kdepim-runtime and nepomuk-core compiled from KDE/4.10 branches)

The crash can be reproduced every time.

-- Backtrace:
Application: NepomukIndexer (nepomukindexer), signal: Segmentation fault
[KCrash Handler]
#5  0x00007f6a1976a841 in Poppler::Page::text(QRectF const&, Poppler::Page::TextLayout) const () from /usr/lib/libpoppler-qt4.so.4
#6  0x00007f6a1976a9bb in Poppler::Page::text(QRectF const&) const () from /usr/lib/libpoppler-qt4.so.4
#7  0x00007f6a199a2fea in Nepomuk2::PopplerExtractor::extract (this=<optimized out>, resUri=<optimized out>, fileUrl=<optimized out>, mimeType=<optimized out>) at /root/nepomuk-core/services/fileindexer/indexer/popplerextractor.cpp:98
#8  0x000000000040a612 in Nepomuk2::Indexer::fileIndex (this=0x7fff1b18b310, uri=..., url=..., mimeType=...) at /root/nepomuk-core/services/fileindexer/indexer/indexer.cpp:146
#9  0x000000000040b170 in Nepomuk2::Indexer::indexFile (this=0x7fff1b18b310, url=...) at /root/nepomuk-core/services/fileindexer/indexer/indexer.cpp:101
#10 0x000000000040860e in main (argc=2, argv=0x7fff1b18b478) at /root/nepomuk-core/services/fileindexer/indexer/main.cpp:113

Reported using DrKonqi

Comment 1 Andreas Krohn 2013-01-05 19:18:19 UTC

*** Bug 312701 has been marked as a duplicate of this bug. ***

Comment 2 Andreas Krohn 2013-01-05 19:23:04 UTC

Further duplicates are probably:
https://bugs.kde.org/show_bug.cgi?id=312633
https://bugs.kde.org/show_bug.cgi?id=312673

Comment 3 Jörg Ehrichs 2013-01-06 21:46:38 UTC

*** Bug 312633 has been marked as a duplicate of this bug. ***

Comment 4 Jörg Ehrichs 2013-01-06 21:46:52 UTC

*** Bug 312673 has been marked as a duplicate of this bug. ***

Comment 5 Alejandro Nova 2013-01-07 01:58:19 UTC

Related to this: the old Nepomuk code skipped every file that led to crashes, but the new code doesn't. Keep that in mind.

Thanks for the quick fix, Jörg.

Comment 6 Andreas Krohn 2013-01-07 08:09:18 UTC

Thanks for looking into it and fixing it.. but: where and how?
http://techbase.kde.org/Projects/Nepomuk/Repositories <-- dead links in here..

Comment 7 Vishesh Handa 2013-01-07 08:14:14 UTC

(In reply to comment #6)
> Thanks for looking into it and fixing it.. but: where and how?
> http://techbase.kde.org/Projects/Nepomuk/Repositories <-- dead links in
> here..

It has been fixed in the nepomuk-core repository.

Comment 8 Jörg Ehrichs 2013-01-07 15:33:25 UTC

*** Bug 312818 has been marked as a duplicate of this bug. ***

Comment 9 Vishesh Handa 2013-01-08 11:34:20 UTC

*** Bug 312864 has been marked as a duplicate of this bug. ***

Comment 10 Vishesh Handa 2013-01-09 08:59:05 UTC

*** Bug 312922 has been marked as a duplicate of this bug. ***

Comment 11 Stefan Radermacher 2013-01-09 09:15:08 UTC

(In reply to comment #5)
> Related to this: the old Nepomuk code skipped every file that led to
> crashes, but the new code doesn't. Keep that in mind.

That explains why I've never been able to find anything on my pdf files.

Comment 12 Jörg Ehrichs 2013-01-09 12:57:06 UTC

*** Bug 312937 has been marked as a duplicate of this bug. ***

Comment 13 Christoph Feck 2013-01-10 21:29:51 UTC

*** Bug 313042 has been marked as a duplicate of this bug. ***

Comment 14 Stefan Radermacher 2013-01-11 14:35:50 UTC

(In reply to comment #7)
> It has been fixed in the nepomuk-core repository.

Has it been fixed so that the files are ignored as previously, or so that they can now be indexed?

Comment 15 Jörg Ehrichs 2013-01-11 15:40:42 UTC

(In reply to comment #14)
> Has it been fixed so that the files are ignored as previously, or so that
> they can now be indexed?

Yes the "broken" pdf files will be indexed normally.
Except that the plainTextContent will not be available and the new title extarction method does not work.

Comment 16 Stefan Radermacher 2013-01-12 11:12:19 UTC

(In reply to comment #15)
I checked out the code changes and it seems that basically the analysis of the page in question is cancelled. I'm wondering why the crash happens at all, I have a lot of PDFs for the Pathfinder Roleplaying Game I'd like to be able to do a full text search on. Text extraction in Okular with Poppler backend works fine in these files, and Spotlight on Mac OS X indexes them without problems. Is it possible to determine from the crash data which file actually causes the problems?

Comment 17 Jörg Ehrichs 2013-01-12 12:09:53 UTC

(In reply to comment #16)
> [...] Text extraction in Okular with Poppler
> backend works fine in these files, and Spotlight on Mac OS X indexes them
> without problems.

The testfile I got showed just a black page when opened with Okular. So I assume your Pathfidner pdfs will be fine.

(In reply to comment #16)
> Is it possible to determine from the crash data which file
> actually causes the problems?

Now there is, I have added kWarning outputs to the places where the extraction was skipped now.
Sou you can easily run nepomukfileindexer <folder/file> and check the output.