Bug 482420 - Baloo file indexer crashes when accessing amazons azw or azw3 ebooks
Summary: Baloo file indexer crashes when accessing amazons azw or azw3 ebooks
Status: REPORTED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Engine (show other bugs)
Version: 6.0.0
Platform: Neon Linux
: NOR crash
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords: qt6
Depends on:
Blocks:
 
Reported: 2024-03-04 21:55 UTC by Ray
Modified: 2024-03-08 21:51 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ray 2024-03-04 21:55:04 UTC
SUMMARY
If I keep some amazon azw or azw3 files in a indexed folder, baloo crashes. Removing them from indexed folders lets baloo finish indexing


STEPS TO REPRODUCE
1. put amazon AZW EBooks in an indexed folder
2. run baloo indexer
3. baloo indexer crashes

OBSERVED RESULT
kf.baloo: Extractor probably crashed
ASSERT: "i >= 0 && i < size()" in file /usr/include/x86_64-linux-gnu/qt6/QtCore/qbytearray.h, line 576
KCrash: Application 'baloo_file_extractor' crashing... crashRecursionCounter = 2
KCrash: Application Name = baloo_file_extractor path = /usr/lib/x86_64-linux-gnu/libexec/kf6 pid = 76506
KCrash: Arguments: /usr/lib/x86_64-linux-gnu/libexec/kf6/baloo_file_extractor 
KCrash: Attempting to start /usr/lib/x86_64-linux-gnu/libexec/drkonqi
void ReportInterface::maybePickUpPostbox()
kf5idletime_wayland: This plugin does not support polling idle time
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
QSocketNotifier: Invalid socket 6 and type 'Read', disabling...
QSocketNotifier: Invalid socket 25 and type 'Read', disabling...
void ReportInterface::maybePickUpPostbox()


EXPECTED RESULT
baloo skips azw files

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: KDE Neon (updates from today)
(available in About System)
KDE Plasma Version: 6.0.0
KDE Frameworks Version: 6.0.0
Qt Version: 6.6.2

ADDITIONAL INFORMATION
In Plasma 5 baloo was complaining about these files too, but didn't crash
Comment 1 tagwerk19 2024-03-05 13:17:34 UTC
See what journalctl says...

    ... Perhaps with debugging on

Create/Edit your

    ~/.config/QtProject/qtlogging.ini

file and make sure it has

    [rules]
    kf.baloo=true

I'm wondering if Baloo is finding the extractors.
Comment 2 tagwerk19 2024-03-08 08:05:58 UTC
If I download the .azw3 testfiles from https://filesamples.com/formats/azw3 and move to an indexed folder, I see:

    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5653976234959581 "/home/test/Testdir/Alices Adventures in Wonderland.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5673947832885981 "/home/test/Testdir/Around the World in 28 Languages.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5675322222420701 "/home/test/Testdir/famouspaintings.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5675326517387997 "/home/test/Testdir/sample1.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5677272137573085 "/home/test/Testdir/Sway.azw3" "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook"
    Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook"

This is on Neon Unstable, albeit not a clean install.

I'm not so sure on Neon Testing, I have to use "balooctl6" and "baloosearch6" on the command line so maybe there's an issue with a "half this" and "half that"  install. Nevertheless I can download and index the .azw3 testfiles, "baloosearch6 alice" works...

(filesamples.com seems to have .azw3 test files, no .azw)
Comment 3 Ray 2024-03-08 16:12:20 UTC
thank you very much for looking into it. No matter what I do, it indexes these test files. It doesn't crash with my files - but doesn't index them -which is totaly fine. Probably due tue drm rules.
It could be a naming issue - I'll look into that now
Comment 4 Ray 2024-03-08 16:24:23 UTC
balooctl6 still says that these files aren't indexed after renaming them. It looks like these failed files stay till the database is purged.
I'm quite sure it was an issue with utf 8 iso-8859-1 or something like that, or there where some updates which sorted things out.
I could purge the database if you want me to, but i'd say case closed.

Thank you
Comment 5 tagwerk19 2024-03-08 21:51:53 UTC
(In reply to Ray from comment #3)
> thank you very much for looking into it.
All part of the service :-)

> .... No matter what I do, it indexes these test files. It doesn't crash
> with my files - but doesn't index them -which is totaly fine. Probably
> due to drm rules ...
I can quite imagine DRM causing trouble, I don't know how Baloo behaves when it meets it, I think it should at least flag or log that it's met something it cannot extract...