Bug 406116 - Baloo_file_extractor Crashes in kfilemetadata_epubextractor.so
Summary: Baloo_file_extractor Crashes in kfilemetadata_epubextractor.so
Status: RESOLVED DOWNSTREAM
Alias: None
Product: frameworks-kfilemetadata
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.56.0
Platform: Neon Linux
: NOR crash
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords: drkonqi
: 411627 416199 417492 417656 417676 417677 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-04-01 14:32 UTC by luca
Modified: 2020-07-13 09:03 UTC (History)
9 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Troublesome ePub (716.55 KB, application/epub+zip)
2019-04-18 08:15 UTC, luca
Details
minimal broken sample (2.86 KB, application/epub+zip)
2019-04-30 16:29 UTC, Stefan Brüns
Details
Patch for libepub/ebook-tools (1.96 KB, patch)
2019-05-01 22:30 UTC, Stefan Brüns
Details
New crash information added by DrKonqi (4.31 KB, text/plain)
2020-07-13 09:03 UTC, Balam
Details

Note You need to log in before you can comment on or make changes to this bug.
Description luca 2019-04-01 14:32:08 UTC
Application: baloo_file_extractor (5.56.0)

Qt Version: 5.12.0
Frameworks Version: 5.56.0
Operating System: Linux 4.15.0-45-generic x86_64
Distribution: KDE neon User Edition 5.15

-- Information about the crash:
Cold boot and login to desktop. This problem has been occuring since I've installed the system.

The crash can be reproduced every time.

-- Backtrace:
Application: Estrattore file di Baloo (baloo_file_extractor), signal: Segmentation fault
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Current thread is 1 (Thread 0x7fd523d4ac80 (LWP 6744))]

Thread 3 (Thread 0x7fd4fd9c4700 (LWP 6752)):
#0  0x00007fd52119305c in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#1  0x00007fd51cf1ca98 in g_main_context_prepare () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007fd51cf1d46b in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007fd51cf1d64c in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007fd52119315b in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5  0x00007fd52113464a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#6  0x00007fd520f5c41a in QThread::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#7  0x00007fd522b28015 in  () at /usr/lib/x86_64-linux-gnu/libQt5DBus.so.5
#8  0x00007fd520f5dbc2 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9  0x00007fd51f0b96db in start_thread (arg=0x7fd4fd9c4700) at pthread_create.c:463
#10 0x00007fd52085d88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fd517103700 (LWP 6746)):
#0  0x00007fd520850bf9 in __GI___poll (fds=0x7fd517102cb8, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fd51caac747 in  () at /usr/lib/x86_64-linux-gnu/libxcb.so.1
#2  0x00007fd51caae36a in xcb_wait_for_event () at /usr/lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007fd51867e32a in  () at /usr/lib/x86_64-linux-gnu/libQt5XcbQpa.so.5
#4  0x00007fd520f5dbc2 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5  0x00007fd51f0b96db in start_thread (arg=0x7fd517103700) at pthread_create.c:463
#6  0x00007fd52085d88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fd523d4ac80 (LWP 6744)):
[KCrash Handler]
#6  0x00007fd515edbcc2 in _opf_label_get_by_lang () at /usr/lib/libepub.so.0
#7  0x00007fd515ed893c in epub_tit_next () at /usr/lib/libepub.so.0
#8  0x00007fd5160e3fa8 in  () at /usr/lib/x86_64-linux-gnu/qt5/plugins/kf5/kfilemetadata/kfilemetadata_epubextractor.so
#9  0x0000560c1439525b in Baloo::App::index(Baloo::Transaction*, QString const&, unsigned long long) (this=this@entry=0x7fff5aa08990, tr=0x560c16362d70, url=..., id=id@entry=59177906240227330) at ./src/file/extractor/app.cpp:191
#10 0x0000560c14395b9e in Baloo::App::processNextFile() (this=0x7fff5aa08990) at ./src/file/extractor/app.cpp:111
#11 0x00007fd521172d04 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#12 0x00007fd52116694b in QObject::event(QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#13 0x00007fd521f2c83c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#14 0x00007fd521f33dd0 in QApplication::notify(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#15 0x00007fd521136328 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007fd5211925a9 in QTimerInfoList::activateTimers() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007fd521192da9 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#18 0x00007fd51cf1d387 in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x00007fd51cf1d5c0 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x00007fd51cf1d64c in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x00007fd52119313f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007fd52113464a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x00007fd52113d800 in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#24 0x0000560c1439422d in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at ./src/file/extractor/main.cpp:60

Reported using DrKonqi
Comment 1 Nate Graham 2019-04-01 18:11:41 UTC
Crashing in kfilemetadata_epubextractor.so, and possibly even deeper (maybe in _opf_label_get_by_lang() at /usr/lib/libepub.so.0) Looks like it's having trouble parsing one of your epubs. Can you find which file it is by using `baloocal monitor` and then attach that epub?

Also installing debug symbols for the kfilemetadata framework and then reproducing the crash and attaching a new backtrace would be very helpful.
Comment 2 luca 2019-04-12 16:29:29 UTC
Hi Nate.

Thanks for your reply.

Unluckily i can't find kfilemetadata-dbg for bionic (I'm on KDE Neon).
Neither I've understood how to find the "troublesome" epub with
`balooctl monitor`.

Could you please give me some further advice?

Cheers,
Luca
Comment 3 Nate Graham 2019-04-12 17:13:07 UTC
Run `balooctl monitor` in a terminal window and leave it running.

Then in another terminal window or tab, turn off Baloo and turn it back on with `balooctl disable && balooctl enable`

The terminal window/tab with the monitor running will show you in real-time which file it's indexing.

Eventually Baloo will crash, and the last file listed in the monitor window/tab will be the file it crashed on. Then you can attach that file to the bug.
Comment 4 luca 2019-04-18 08:15:36 UTC
Created attachment 119480 [details]
Troublesome ePub

This file, apparently, makes Baloo crash.
Comment 5 Christoph Feck 2019-04-25 09:44:45 UTC
New information was added with comment #4; changing status for inspection.
Comment 6 Alexander Stippich 2019-04-28 09:25:57 UTC
Can reproduce with the test file. I found it to be crashing on
https://phabricator.kde.org/source/kfilemetadata/browse/master/src/extractors/epubextractor.cpp$171
Comment 7 Stefan Brüns 2019-04-29 11:54:23 UTC
(In reply to Alexander Stippich from comment #6)
> Can reproduce with the test file. I found it to be crashing on
> https://phabricator.kde.org/source/kfilemetadata/browse/master/src/
> extractors/epubextractor.cpp$171

The epub file is invalid, as the last entry in its toc.ncx has a navPoint element without the mandatory navLabel
(The bad entry is likely caused by some licensing framework which mangled the original epub):

---
	<navPoint id="license" playOrder="77"><content src="Testo/license_IvREjeLk.htm"/></navPoint></navMap>
---

https://groups.niso.org/apps/group_public/download.php/14650/Z39_86_2005r2012.pdf#page=59

Unfortunately, libepub does not check for this, neither after parsing:
https://sourceforge.net/p/ebook-tools/code/HEAD/tree/trunk/ebook-tools/src/libepub/opf.c#l361

nor when accessing it during iterator next:
https://sourceforge.net/p/ebook-tools/code/HEAD/tree/trunk/ebook-tools/src/libepub/epub.c#l473

(though here it does:
https://sourceforge.net/p/ebook-tools/code/HEAD/tree/trunk/ebook-tools/src/libepub/epub.c#l530 )


Unfortunately, libepub upstream is dormant for 7 years ...
Comment 8 Stefan Brüns 2019-04-30 16:29:31 UTC
Created attachment 119751 [details]
minimal broken sample
Comment 9 Stefan Brüns 2019-04-30 16:31:11 UTC
Okular also crashes, as it also uses libepub from ebook-tools.
Comment 10 Stefan Brüns 2019-04-30 17:22:58 UTC
Git commit 74c8fcf5bb0df38270fddd625941f4f24ce76d45 by Stefan Brüns.
Committed on 30/04/2019 at 17:22.
Pushed by bruns into branch 'master'.

[balooctl] Add command to show files failed to index

Summary:
Baloo was missing any means to retrieve the list of files which failed
to index.

Test Plan: $> balooctl failed

Reviewers: #baloo, #frameworks, ngraham, astippich

Reviewed By: #baloo, ngraham

Subscribers: kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D20918

M  +6    -0    src/engine/transaction.cpp
M  +1    -0    src/engine/transaction.h
M  +27   -0    src/tools/balooctl/main.cpp

https://commits.kde.org/baloo/74c8fcf5bb0df38270fddd625941f4f24ce76d45
Comment 11 Stefan Brüns 2019-05-01 22:30:16 UTC
Created attachment 119785 [details]
Patch for libepub/ebook-tools

Patch pushed to openSUSE Tumbleweed, provided for other distributions to pick up.
Comment 12 Nate Graham 2019-05-02 18:33:09 UTC
I have posted information about it to the distributions mailing list, thanks!
Comment 13 Albert Astals Cid 2019-05-22 22:25:19 UTC
Git commit 9f98c010691ed73d11c83a1694823aba60b12e32 by Albert Astals Cid, on behalf of Stefan Brüns.
Committed on 22/05/2019 at 22:24.
Pushed by aacid into branch 'Applications/19.04'.

[EPubGenerator] Avoid crashes due to bogus wrapping of content in table

Summary:
QTextDocument chokes badly when a some documents are wrapped inside
a table, returning e.g. a pagecount of -41292 afterwards.

On the downside, this removes any padding from the page. On the upside,
it removes any padding from the page.
Related: bug 406738, bug 407140

Reviewers: #okular

Subscribers: okular-devel

Tags: #okular

Differential Revision: https://phabricator.kde.org/D20949

M  +1    -7    generators/epub/converter.cpp

https://commits.kde.org/okular/9f98c010691ed73d11c83a1694823aba60b12e32
Comment 14 Stefan Brüns 2019-09-05 20:54:51 UTC
*** Bug 411627 has been marked as a duplicate of this bug. ***
Comment 15 Stefan Brüns 2020-01-13 20:11:21 UTC
*** Bug 416199 has been marked as a duplicate of this bug. ***
Comment 16 Stefan Brüns 2020-02-12 16:04:22 UTC
Has to be fixed in distributions by picking up the patch from comment #11
Comment 17 Stefan Brüns 2020-02-12 16:04:40 UTC
*** Bug 417492 has been marked as a duplicate of this bug. ***
Comment 18 Stefan Brüns 2020-02-14 18:15:01 UTC
*** Bug 417656 has been marked as a duplicate of this bug. ***
Comment 19 Stefan Brüns 2020-02-15 01:55:26 UTC
*** Bug 417677 has been marked as a duplicate of this bug. ***
Comment 20 Stefan Brüns 2020-02-15 01:55:47 UTC
*** Bug 417676 has been marked as a duplicate of this bug. ***
Comment 21 Balam 2020-07-13 09:03:33 UTC
Created attachment 130078 [details]
New crash information added by DrKonqi

baloo_file_extractor (5.70.0) using Qt 5.14.2

- What I was doing when the application crashed:

Baloo crashes every time it starts and then launches without any problem by clicking on 'restart program'.

-- Backtrace (Reduced):
#5  0x00007f568dbc1b88 in FindNode () from /lib64/libepub.so.0
#6  0x00007f568dbc0859 in _opf_manifest_get_by_id () from /lib64/libepub.so.0
#7  0x00007f568dbbda0d in _get_spine_it_url () from /lib64/libepub.so.0
#8  0x00007f568dbbdab2 in epub_it_get_curr () from /lib64/libepub.so.0
#9  0x00007f568dbcb0cd in KFileMetaData::EPubExtractor::extract(KFileMetaData::ExtractionResult*) () from /usr/lib64/qt5/plugins/kf5/kfilemetadata/kfilemetadata_epubextractor.so