Bug 482420 - Baloo file indexer crashes when accessing amazons azw or azw3 ebooks
Summary: Baloo file indexer crashes when accessing amazons azw or azw3 ebooks
Status: RESOLVED WAITINGFORINFO
Alias: None
Product: kdegraphics-mobipocket
Classification: Frameworks and Libraries
Component: general (other bugs)
Version First Reported In: 2.1.0
Platform: Neon Linux
: NOR crash
Target Milestone: ---
Assignee: Unassigned bugs
URL:
Keywords: qt6
Depends on:
Blocks:
 
Reported: 2024-03-04 21:55 UTC by Ray
Modified: 2025-08-14 12:50 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ray 2024-03-04 21:55:04 UTC
SUMMARY
If I keep some amazon azw or azw3 files in a indexed folder, baloo crashes. Removing them from indexed folders lets baloo finish indexing


STEPS TO REPRODUCE
1. put amazon AZW EBooks in an indexed folder
2. run baloo indexer
3. baloo indexer crashes

OBSERVED RESULT
kf.baloo: Extractor probably crashed
ASSERT: "i >= 0 && i < size()" in file /usr/include/x86_64-linux-gnu/qt6/QtCore/qbytearray.h, line 576
KCrash: Application 'baloo_file_extractor' crashing... crashRecursionCounter = 2
KCrash: Application Name = baloo_file_extractor path = /usr/lib/x86_64-linux-gnu/libexec/kf6 pid = 76506
KCrash: Arguments: /usr/lib/x86_64-linux-gnu/libexec/kf6/baloo_file_extractor 
KCrash: Attempting to start /usr/lib/x86_64-linux-gnu/libexec/drkonqi
void ReportInterface::maybePickUpPostbox()
kf5idletime_wayland: This plugin does not support polling idle time
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
QSocketNotifier: Invalid socket 6 and type 'Read', disabling...
QSocketNotifier: Invalid socket 25 and type 'Read', disabling...
void ReportInterface::maybePickUpPostbox()


EXPECTED RESULT
baloo skips azw files

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: KDE Neon (updates from today)
(available in About System)
KDE Plasma Version: 6.0.0
KDE Frameworks Version: 6.0.0
Qt Version: 6.6.2

ADDITIONAL INFORMATION
In Plasma 5 baloo was complaining about these files too, but didn't crash
Comment 1 tagwerk19 2024-03-05 13:17:34 UTC
See what journalctl says...

    ... Perhaps with debugging on

Create/Edit your

    ~/.config/QtProject/qtlogging.ini

file and make sure it has

    [rules]
    kf.baloo=true

I'm wondering if Baloo is finding the extractors.
Comment 2 tagwerk19 2024-03-08 08:05:58 UTC
If I download the .azw3 testfiles from https://filesamples.com/formats/azw3 and move to an indexed folder, I see:

    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5653976234959581 "/home/test/Testdir/Alices Adventures in Wonderland.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5673947832885981 "/home/test/Testdir/Around the World in 28 Languages.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5675322222420701 "/home/test/Testdir/famouspaintings.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5675326517387997 "/home/test/Testdir/sample1.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5677272137573085 "/home/test/Testdir/Sway.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"

This is on Neon Unstable, albeit not a clean install.

I'm not so sure on Neon Testing, I have to use "balooctl6" and "baloosearch6" on the command line so maybe there's an issue with a "half this" and "half that"  install. Nevertheless I can download and index the .azw3 testfiles, "baloosearch6 alice" works...

(filesamples.com seems to have .azw3 test files, no .azw)
Comment 3 Ray 2024-03-08 16:12:20 UTC
thank you very much for looking into it. No matter what I do, it indexes these test files. It doesn't crash with my files - but doesn't index them -which is totaly fine. Probably due tue drm rules.
It could be a naming issue - I'll look into that now
Comment 4 Ray 2024-03-08 16:24:23 UTC
balooctl6 still says that these files aren't indexed after renaming them. It looks like these failed files stay till the database is purged.
I'm quite sure it was an issue with utf 8 iso-8859-1 or something like that, or there where some updates which sorted things out.
I could purge the database if you want me to, but i'd say case closed.

Thank you
Comment 5 tagwerk19 2024-03-08 21:51:53 UTC
(In reply to Ray from comment #3)
> thank you very much for looking into it.
All part of the service :-)

> .... No matter what I do, it indexes these test files. It doesn't crash
> with my files - but doesn't index them -which is totaly fine. Probably
> due to drm rules ...
I can quite imagine DRM causing trouble, I don't know how Baloo behaves when it meets it, I think it should at least flag or log that it's met something it cannot extract...
Comment 6 Stefan Brüns 2025-02-23 03:20:39 UTC
When baloo crashes trying to index a mobipocket file, this is actually the the fault of the mobipocket library.
Comment 7 Stefan Brüns 2025-02-23 03:38:41 UTC
(In reply to tagwerk19 from comment #5)
> I can quite imagine DRM causing trouble, I don't know how Baloo behaves when
> it meets it, I think it should at least flag or log that it's met something
> it cannot extract...

DRM is checked for, so these files should at least appear with the metadata (which is unprotected).
Comment 8 Stefan Brüns 2025-02-23 03:55:31 UTC
Git commit a188b893654fe5f88b1ebab7e8341ceb181f6dc9 by Stefan Brüns.
Committed on 23/02/2025 at 03:54.
Pushed by bruns into branch 'disable_mobipocket_text'.

[MobiExtractor] Disable buggy text extraction by default

The text extraction in mobiextractor is extremely buggy, and causes
a lot of bug reports for baloo (which then gets blamed for its
"buggyness" when calling third-party code).

QMobipocket lacks support for any halfway current mobipocket version
(last supported: 4, current: 8), and has no testsuite.

Make this opt-in ("ENABLE_MOBIPOCKET_TEXT_EXTRACTION") until the bugs
in QMobiPocket gets fixed.

SENTRY: BALOO-2N5
SENTRY: BALOO-426
SENTRY: BALOO-33
// use `stack.filename is mobipocket.cpp` for more
Related: bug 475975, bug 489275

M  +1    -0    CMakeLists.txt
M  +3    -0    src/extractors/CMakeLists.txt
M  +2    -1    src/extractors/mobiextractor.cpp

https://invent.kde.org/frameworks/kfilemetadata/-/commit/a188b893654fe5f88b1ebab7e8341ceb181f6dc9
Comment 9 Stefan Brüns 2025-03-15 12:19:57 UTC
Git commit 8bd1e61cca1e07a0ffce7ff79b861e2872662e6d by Stefan Brüns.
Committed on 15/03/2025 at 12:16.
Pushed by bruns into branch 'master'.

[MobiExtractor] Disable buggy text extraction by default

The text extraction in mobiextractor is extremely buggy, and causes
a lot of bug reports for baloo (which then gets blamed for its
"buggyness" when calling third-party code).

QMobipocket lacks support for any halfway current mobipocket version
(last supported: 4, current: 8), and has no testsuite.

Make this opt-in ("ENABLE_MOBIPOCKET_TEXT_EXTRACTION") until the bugs
in QMobiPocket gets fixed.

SENTRY: BALOO-2N5
SENTRY: BALOO-426
SENTRY: BALOO-33
// use `stack.filename is mobipocket.cpp` for more
Related: bug 475975, bug 489275

M  +1    -0    CMakeLists.txt
M  +3    -0    src/extractors/CMakeLists.txt
M  +2    -1    src/extractors/mobiextractor.cpp

https://invent.kde.org/frameworks/kfilemetadata/-/commit/8bd1e61cca1e07a0ffce7ff79b861e2872662e6d
Comment 10 Bug Janitor Service 2025-03-16 15:07:25 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/kfilemetadata/-/merge_requests/180
Comment 11 Stefan Brüns 2025-03-21 12:40:17 UTC
Git commit eef273863a4a7e9f4a32817514b877e64010927f by Stefan Brüns.
Committed on 16/03/2025 at 15:46.
Pushed by bruns into branch 'master'.

[MobiExtractor] Add debug message for invalid or DRMed files

Allow users to get some feedback if a file DRM protected.

M  +1    -1    src/extractors/CMakeLists.txt
M  +0    -1    src/extractors/exiv2extractor.cpp
M  +10   -2    src/extractors/mobiextractor.cpp

https://invent.kde.org/frameworks/kfilemetadata/-/commit/eef273863a4a7e9f4a32817514b877e64010927f
Comment 12 Bug Janitor Service 2025-06-01 23:32:02 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/kdegraphics-mobipocket/-/merge_requests/35
Comment 13 Stefan Brüns 2025-06-09 08:47:42 UTC
Git commit 439a01662e72102e114a46d168fbabbb4de04184 by Stefan Brüns.
Committed on 07/06/2025 at 15:19.
Pushed by bruns into branch 'master'.

Handle trailing data entries correctly

Text records may contain extra auxiliary data which should not be fed
to the decompressor.

The existence of such data is signalled by the `extraflags` header field,
and each set bit signals the corresponding extra data which will be
present in all text records.

The entries can be decoded (or removed) by reading the record from the
back. When an entry is present, its size will be at the very end of the
record, preceded by the actual data.
Related: bug 475975, bug 489275

M  +0    -1    autotests/mobipockettest.cpp
M  +62   -3    lib/mobipocket.cpp

https://invent.kde.org/graphics/kdegraphics-mobipocket/-/commit/439a01662e72102e114a46d168fbabbb4de04184
Comment 14 Stefan Brüns 2025-08-14 12:50:45 UTC
Likely fixed in 25.08.