Bug 486853 - Baloo file extractor crashes a dozen times for Mobipocket files
Summary: Baloo file extractor crashes a dozen times for Mobipocket files
Status: RESOLVED DUPLICATE of bug 475975
Alias: None
Product: kdegraphics-mobipocket
Classification: Frameworks and Libraries
Component: general (other bugs)
Version First Reported In: 2.1.0
Platform: Fedora RPMs Linux
: NOR crash
Target Milestone: ---
Assignee: Unassigned bugs
URL:
Keywords: drkonqi
Depends on:
Blocks:
 
Reported: 2024-05-10 15:02 UTC by Christian (Fuchs)
Modified: 2025-03-23 21:29 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian (Fuchs) 2024-05-10 15:02:45 UTC
Application: baloo_file_extractor (6.1.0)

Qt Version: 6.7.0
Frameworks Version: 6.1.0
Operating System: Linux 6.8.8-300.fc40.x86_64 x86_64
Windowing System: X11
Distribution: "Fedora Linux 40 (KDE Plasma)"
DrKonqi: 6.0.4 [CoredumpBackend]

-- Information about the crash:
After re-installing my laptop with Fedora 40 and putting my /home backup back  (excluding the baloo folder under .local/share and excluding baloo config under .config, as I epxected that to be problematic) baloo started re-indexing and crashed a couple of dozen times in a short row.
I hope the backtrace is somewhat helpful in finding the root cause, else feel free to get back to me if you need further information.

The crash can be reproduced every time.

-- Backtrace:
Application: Baloo-Dateiinfosammler (baloo_file_extractor), signal: Segmentation fault


This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[New LWP 9643]
[New LWP 9688]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/kf6/baloo_file_extractor'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc2120ab144 in __pthread_kill_implementation () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fc20e151f40 (LWP 9643))]
Cannot QML trace cores :(
[Current thread is 1 (Thread 0x7fc20e151f40 (LWP 9643))]

Thread 2 (Thread 0x7f82000006c0 (LWP 9688)):
#0  0x00007fc21211d72d in poll () from /lib64/libc.so.6
#1  0x00007fc210d7c724 in g_main_context_iterate_unlocked.isra () from /lib64/libglib-2.0.so.0
#2  0x00007fc210d1cb03 in g_main_context_iteration () from /lib64/libglib-2.0.so.0
#3  0x00007fc212a7bf83 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /lib64/libQt6Core.so.6
#4  0x00007fc2127a26b3 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /lib64/libQt6Core.so.6
#5  0x00007fc2128b402f in QThread::exec() () from /lib64/libQt6Core.so.6
#6  0x00007fc21256faf1 in QDBusConnectionManager::run() () from /lib64/libQt6DBus.so.6
#7  0x00007fc21294f35c in QThreadPrivate::start(void*) () from /lib64/libQt6Core.so.6
#8  0x00007fc2120a91b7 in start_thread () from /lib64/libc.so.6
#9  0x00007fc21212b39c in clone3 () from /lib64/libc.so.6

Thread 1 (Thread 0x7fc20e151f40 (LWP 9643)):
[KCrash Handler]
#4  0x00007fc21280fd83 in QVariant::QVariant(QString const&) () from /lib64/libQt6Core.so.6
#5  0x00007fc211b44aac in standardDeclarationForNode(QTextHtmlParserNode const&) () from /lib64/libQt6Gui.so.6
#6  0x00007fc211b45bc2 in QTextHtmlParser::declarationsForNode(int) const () from /lib64/libQt6Gui.so.6
#7  0x00007fc211b46830 in QTextHtmlParser::parseTag() () from /lib64/libQt6Gui.so.6
#8  0x00007fc211b46df0 in QTextHtmlParser::parse() () from /lib64/libQt6Gui.so.6
#9  0x00007fc211aeb2ea in QTextHtmlImporter::QTextHtmlImporter(QTextDocument*, QString const&, QTextHtmlImporter::ImportMode, QTextDocument const*) () from /lib64/libQt6Gui.so.6
#10 0x00007fc211ad3f16 in QTextDocument::setHtml(QString const&) () from /lib64/libQt6Gui.so.6
#11 0x00007fc20e11fb7f in KFileMetaData::MobiExtractor::extract(KFileMetaData::ExtractionResult*) () from /usr/lib64/qt6/plugins/kf6/kfilemetadata/kfilemetadata_mobiextractor.so
#12 0x00005606c85350cf in Baloo::App::index(Baloo::Transaction*, QString const&, unsigned long long) ()
#13 0x00005606c8536745 in Baloo::App::processNextFile() ()
#14 0x00007fc2127fa3f4 in void doActivate<false>(QObject*, int, void**) () from /lib64/libQt6Core.so.6
#15 0x00007fc212707496 in QSingleShotTimer::timerEvent(QTimerEvent*) () from /lib64/libQt6Core.so.6
#16 0x00007fc2127ebccf in QObject::event(QEvent*) () from /lib64/libQt6Core.so.6
#17 0x00007fc212795a99 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () from /lib64/libQt6Core.so.6
#18 0x00007fc21294d797 in QTimerInfoList::activateTimers() () from /lib64/libQt6Core.so.6
#19 0x00007fc212a7bdb9 in timerSourceDispatch(_GSource*, int (*)(void*), void*) () from /lib64/libQt6Core.so.6
#20 0x00007fc210d1b68c in g_main_context_dispatch_unlocked.lto_priv () from /lib64/libglib-2.0.so.0
#21 0x00007fc210d7c788 in g_main_context_iterate_unlocked.isra () from /lib64/libglib-2.0.so.0
#22 0x00007fc210d1cb03 in g_main_context_iteration () from /lib64/libglib-2.0.so.0
#23 0x00007fc212a7bf83 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /lib64/libQt6Core.so.6
#24 0x00007fc2127a26b3 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /lib64/libQt6Core.so.6
#25 0x00007fc21279e63c in QCoreApplication::exec() () from /lib64/libQt6Core.so.6
#26 0x00005606c852c4a8 in main ()

Reported using DrKonqi
Comment 1 tagwerk19 2024-05-10 16:12:53 UTC
(In reply to Christian (Fuchs) from comment #0)
> #5  0x00007fc211b44aac in standardDeclarationForNode(QTextHtmlParserNode
> const&) () from /lib64/libQt6Gui.so.6
> #6  0x00007fc211b45bc2 in QTextHtmlParser::declarationsForNode(int) const ()
> from /lib64/libQt6Gui.so.6
> #7  0x00007fc211b46830 in QTextHtmlParser::parseTag() () from
> /lib64/libQt6Gui.so.6
> #8  0x00007fc211b46df0 in QTextHtmlParser::parse() () from
> /lib64/libQt6Gui.so.6
> #9  0x00007fc211aeb2ea in
> QTextHtmlImporter::QTextHtmlImporter(QTextDocument*, QString const&,
> QTextHtmlImporter::ImportMode, QTextDocument const*) () from
> /lib64/libQt6Gui.so.6
> #10 0x00007fc211ad3f16 in QTextDocument::setHtml(QString const&) () from
> /lib64/libQt6Gui.so.6
> #11 0x00007fc20e11fb7f in
> KFileMetaData::MobiExtractor::extract(KFileMetaData::ExtractionResult*) ()
> from /usr/lib64/qt6/plugins/kf6/kfilemetadata/kfilemetadata_mobiextractor.so
It's looking like a "messed up" ebook (a .mobi with embedded HTML?). There's also Bug 475730 and, perhaps more usefully, Bug 475975...
Comment 2 Christian (Fuchs) 2024-05-10 16:56:05 UTC
(In reply to tagwerk19 from comment #1)
> (In reply to Christian (Fuchs) from comment #0)
> > #5  0x00007fc211b44aac in standardDeclarationForNode(QTextHtmlParserNode
> > const&) () from /lib64/libQt6Gui.so.6
> > #6  0x00007fc211b45bc2 in QTextHtmlParser::declarationsForNode(int) const ()
> > from /lib64/libQt6Gui.so.6
> > #7  0x00007fc211b46830 in QTextHtmlParser::parseTag() () from
> > /lib64/libQt6Gui.so.6
> > #8  0x00007fc211b46df0 in QTextHtmlParser::parse() () from
> > /lib64/libQt6Gui.so.6
> > #9  0x00007fc211aeb2ea in
> > QTextHtmlImporter::QTextHtmlImporter(QTextDocument*, QString const&,
> > QTextHtmlImporter::ImportMode, QTextDocument const*) () from
> > /lib64/libQt6Gui.so.6
> > #10 0x00007fc211ad3f16 in QTextDocument::setHtml(QString const&) () from
> > /lib64/libQt6Gui.so.6
> > #11 0x00007fc20e11fb7f in
> > KFileMetaData::MobiExtractor::extract(KFileMetaData::ExtractionResult*) ()
> > from /usr/lib64/qt6/plugins/kf6/kfilemetadata/kfilemetadata_mobiextractor.so
> It's looking like a "messed up" ebook (a .mobi with embedded HTML?). There's
> also Bug 475730 and, perhaps more usefully, Bug 475975...

Thanks for the links, I'll go subscribe to the latter. In an ideal world though, a malformed file should not lead to baloo crashing, very visibly to the end user as it spams the systray with a dozen of Dr. Konqi instances. If there is malformed content produced by a third party library, there should be the equivalent of a catch around that so it could fail on that file gracefully (and potentially mark it to not re-index)
Comment 3 tagwerk19 2024-05-10 22:41:06 UTC
(In reply to Christian (Fuchs) from comment #2)
> ... fail on that file gracefully (and potentially mark it to not re-index) ...
It could be that's caught now, the fix is quite recent though
    https://invent.kde.org/frameworks/baloo/-/merge_requests/174
Hats off to Stefan...
Comment 4 tagwerk19 2024-05-11 10:23:12 UTC
Also some history in Bug 421317 and Bug 477115
Comment 5 tagwerk19 2024-06-21 16:17:53 UTC
Will set as a duplicate of Bug 475975

*** This bug has been marked as a duplicate of bug 475975 ***
Comment 6 Bug Janitor Service 2025-03-15 12:43:43 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/kdegraphics-mobipocket/-/merge_requests/20
Comment 7 Stefan Brüns 2025-03-22 02:01:57 UTC
Git commit a5b423d58133c46791cc53e6d67425366f94b266 by Stefan Brüns.
Committed on 28/02/2025 at 23:07.
Pushed by bruns into branch 'master'.

Fix broken padding in BitReader

The overload taking a char* appends the \0 terminated string, i.e.
QByteArray::append("\x0...") is essentially a noop. This causes
out-of-bounds accesses, either causing asserts or reading invalid data.

See https://doc.qt.io/qt-6/qbytearray.html#append-3

SENTRY: OKULAR-AD
SENTRY: BALOO-33
SENTRY: BALOO-43Y

M  +1    -1    lib/decompressor.cpp

https://invent.kde.org/graphics/kdegraphics-mobipocket/-/commit/a5b423d58133c46791cc53e6d67425366f94b266
Comment 8 Stefan Brüns 2025-03-22 02:02:05 UTC
Git commit 866a069538a268d264cf002aa9570f97a84045da by Stefan Brüns.
Committed on 28/02/2025 at 23:07.
Pushed by bruns into branch 'master'.

Fix possible out-of-bounds access in BitReader

The read function access data up to data[(len + 31)/8], thus len should
reflect the size (count of bits) of the original data, without the
extra padding null characters.

SENTRY: OKULAR-AD
SENTRY: BALOO-33
SENTRY: BALOO-43Y

M  +1    -3    lib/decompressor.cpp

https://invent.kde.org/graphics/kdegraphics-mobipocket/-/commit/866a069538a268d264cf002aa9570f97a84045da
Comment 9 Carl Schwan 2025-03-23 21:29:16 UTC
Git commit ce72f8b3edf53e3df3e5c8f6d59fd3cad9d50d11 by Carl Schwan, on behalf of Stefan Brüns.
Committed on 23/03/2025 at 21:29.
Pushed by carlschwan into branch 'release/25.04'.

Fix broken padding in BitReader

The overload taking a char* appends the \0 terminated string, i.e.
QByteArray::append("\x0...") is essentially a noop. This causes
out-of-bounds accesses, either causing asserts or reading invalid data.

See https://doc.qt.io/qt-6/qbytearray.html#append-3

SENTRY: OKULAR-AD
SENTRY: BALOO-33
SENTRY: BALOO-43Y
(cherry picked from commit a5b423d58133c46791cc53e6d67425366f94b266)

M  +1    -1    lib/decompressor.cpp

https://invent.kde.org/graphics/kdegraphics-mobipocket/-/commit/ce72f8b3edf53e3df3e5c8f6d59fd3cad9d50d11
Comment 10 Carl Schwan 2025-03-23 21:29:24 UTC
Git commit 1eebd7a60571791f5d3447f18749660656c93798 by Carl Schwan, on behalf of Stefan Brüns.
Committed on 23/03/2025 at 21:29.
Pushed by carlschwan into branch 'release/25.04'.

Fix possible out-of-bounds access in BitReader

The read function access data up to data[(len + 31)/8], thus len should
reflect the size (count of bits) of the original data, without the
extra padding null characters.

SENTRY: OKULAR-AD
SENTRY: BALOO-33
SENTRY: BALOO-43Y
(cherry picked from commit 866a069538a268d264cf002aa9570f97a84045da)

M  +1    -3    lib/decompressor.cpp

https://invent.kde.org/graphics/kdegraphics-mobipocket/-/commit/1eebd7a60571791f5d3447f18749660656c93798