Summary: | Baloo file indexer crashes when accessing amazons azw or azw3 ebooks | ||
---|---|---|---|
Product: | [Frameworks and Libraries] kdegraphics-mobipocket | Reporter: | Ray <tabibi> |
Component: | general | Assignee: | Unassigned bugs <unassigned-bugs-null> |
Status: | RESOLVED WAITINGFORINFO | ||
Severity: | crash | CC: | stefan.bruens, tagwerk19 |
Priority: | NOR | Keywords: | qt6 |
Version First Reported In: | 2.1.0 | ||
Target Milestone: | --- | ||
Platform: | Neon | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Ray
2024-03-04 21:55:04 UTC
See what journalctl says... ... Perhaps with debugging on Create/Edit your ~/.config/QtProject/qtlogging.ini file and make sure it has [rules] kf.baloo=true I'm wondering if Baloo is finding the extractors. If I download the .azw3 testfiles from https://filesamples.com/formats/azw3 and move to an indexed folder, I see: Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5653976234959581 "/home/test/Testdir/Alices Adventures in Wonderland.azw3" "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5673947832885981 "/home/test/Testdir/Around the World in 28 Languages.azw3" "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5675322222420701 "/home/test/Testdir/famouspaintings.azw3" "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook" Mar 08 08:24:19 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5675326517387997 "/home/test/Testdir/sample1.azw3" "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.baloo: Indexing 5677272137573085 "/home/test/Testdir/Sway.azw3" "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: No extractor for "application/vnd.amazon.mobi8-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Fetching extractors for "application/x-mobipocket-ebook" Mar 08 08:24:20 ... baloo_file_extractor[2412]: kf.filemetadata: Using inherited mimetype "application/x-mobipocket-ebook" for "application/vnd.amazon.mobi8-ebook" This is on Neon Unstable, albeit not a clean install. I'm not so sure on Neon Testing, I have to use "balooctl6" and "baloosearch6" on the command line so maybe there's an issue with a "half this" and "half that" install. Nevertheless I can download and index the .azw3 testfiles, "baloosearch6 alice" works... (filesamples.com seems to have .azw3 test files, no .azw) thank you very much for looking into it. No matter what I do, it indexes these test files. It doesn't crash with my files - but doesn't index them -which is totaly fine. Probably due tue drm rules. It could be a naming issue - I'll look into that now balooctl6 still says that these files aren't indexed after renaming them. It looks like these failed files stay till the database is purged. I'm quite sure it was an issue with utf 8 iso-8859-1 or something like that, or there where some updates which sorted things out. I could purge the database if you want me to, but i'd say case closed. Thank you (In reply to Ray from comment #3) > thank you very much for looking into it. All part of the service :-) > .... No matter what I do, it indexes these test files. It doesn't crash > with my files - but doesn't index them -which is totaly fine. Probably > due to drm rules ... I can quite imagine DRM causing trouble, I don't know how Baloo behaves when it meets it, I think it should at least flag or log that it's met something it cannot extract... When baloo crashes trying to index a mobipocket file, this is actually the the fault of the mobipocket library. (In reply to tagwerk19 from comment #5) > I can quite imagine DRM causing trouble, I don't know how Baloo behaves when > it meets it, I think it should at least flag or log that it's met something > it cannot extract... DRM is checked for, so these files should at least appear with the metadata (which is unprotected). Git commit a188b893654fe5f88b1ebab7e8341ceb181f6dc9 by Stefan Brüns. Committed on 23/02/2025 at 03:54. Pushed by bruns into branch 'disable_mobipocket_text'. [MobiExtractor] Disable buggy text extraction by default The text extraction in mobiextractor is extremely buggy, and causes a lot of bug reports for baloo (which then gets blamed for its "buggyness" when calling third-party code). QMobipocket lacks support for any halfway current mobipocket version (last supported: 4, current: 8), and has no testsuite. Make this opt-in ("ENABLE_MOBIPOCKET_TEXT_EXTRACTION") until the bugs in QMobiPocket gets fixed. SENTRY: BALOO-2N5 SENTRY: BALOO-426 SENTRY: BALOO-33 // use `stack.filename is mobipocket.cpp` for more Related: bug 475975, bug 489275 M +1 -0 CMakeLists.txt M +3 -0 src/extractors/CMakeLists.txt M +2 -1 src/extractors/mobiextractor.cpp https://invent.kde.org/frameworks/kfilemetadata/-/commit/a188b893654fe5f88b1ebab7e8341ceb181f6dc9 Git commit 8bd1e61cca1e07a0ffce7ff79b861e2872662e6d by Stefan Brüns. Committed on 15/03/2025 at 12:16. Pushed by bruns into branch 'master'. [MobiExtractor] Disable buggy text extraction by default The text extraction in mobiextractor is extremely buggy, and causes a lot of bug reports for baloo (which then gets blamed for its "buggyness" when calling third-party code). QMobipocket lacks support for any halfway current mobipocket version (last supported: 4, current: 8), and has no testsuite. Make this opt-in ("ENABLE_MOBIPOCKET_TEXT_EXTRACTION") until the bugs in QMobiPocket gets fixed. SENTRY: BALOO-2N5 SENTRY: BALOO-426 SENTRY: BALOO-33 // use `stack.filename is mobipocket.cpp` for more Related: bug 475975, bug 489275 M +1 -0 CMakeLists.txt M +3 -0 src/extractors/CMakeLists.txt M +2 -1 src/extractors/mobiextractor.cpp https://invent.kde.org/frameworks/kfilemetadata/-/commit/8bd1e61cca1e07a0ffce7ff79b861e2872662e6d A possibly relevant merge request was started @ https://invent.kde.org/frameworks/kfilemetadata/-/merge_requests/180 Git commit eef273863a4a7e9f4a32817514b877e64010927f by Stefan Brüns. Committed on 16/03/2025 at 15:46. Pushed by bruns into branch 'master'. [MobiExtractor] Add debug message for invalid or DRMed files Allow users to get some feedback if a file DRM protected. M +1 -1 src/extractors/CMakeLists.txt M +0 -1 src/extractors/exiv2extractor.cpp M +10 -2 src/extractors/mobiextractor.cpp https://invent.kde.org/frameworks/kfilemetadata/-/commit/eef273863a4a7e9f4a32817514b877e64010927f A possibly relevant merge request was started @ https://invent.kde.org/graphics/kdegraphics-mobipocket/-/merge_requests/35 Git commit 439a01662e72102e114a46d168fbabbb4de04184 by Stefan Brüns. Committed on 07/06/2025 at 15:19. Pushed by bruns into branch 'master'. Handle trailing data entries correctly Text records may contain extra auxiliary data which should not be fed to the decompressor. The existence of such data is signalled by the `extraflags` header field, and each set bit signals the corresponding extra data which will be present in all text records. The entries can be decoded (or removed) by reading the record from the back. When an entry is present, its size will be at the very end of the record, preceded by the actual data. Related: bug 475975, bug 489275 M +0 -1 autotests/mobipockettest.cpp M +62 -3 lib/mobipocket.cpp https://invent.kde.org/graphics/kdegraphics-mobipocket/-/commit/439a01662e72102e114a46d168fbabbb4de04184 Likely fixed in 25.08. |