Bug 515918 - File extractor keeps asserting txn != nullptr
Summary: File extractor keeps asserting txn != nullptr
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR crash
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
: 515651 515911 515944 (view as bug list)
Depends on:
Blocks:
 
Reported: 2026-02-12 20:24 UTC by Kai Uwe Broulik
Modified: 2026-02-13 12:40 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
baloo_file_extractor backtrace (2026/02/13) (6.34 KB, text/plain)
2026-02-13 07:40 UTC, tagwerk19
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kai Uwe Broulik 2026-02-12 20:24:54 UTC
SUMMARY
Baloo file extractor keeps asserting txn != nullptr frequently on large(r) files, e.g. PDFs and large text files

STEPS TO REPRODUCE
1. Have baloo enabled, log into your system, perhaps clear the index so it runs again

OBSERVED RESULT
Baloo keeps crashing all the freaking time spawning milloins of drkonqis
Without asserts enabled it prints "m_writeTrans is null" in the log

EXPECTED RESULT
Baloo works as it used to

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: git master as of 2025-2-12

Git bisect suggests
e75cdd6016ba5433c05fbb04f4424b630c40dfbf is the first bad commit
commit e75cdd6016ba5433c05fbb04f4424b630c40dfbf
Author: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Date:   Thu Jan 8 20:18:28 2026 +0100

    [Extractor] Release DB write lock while content is extracted
    
    The extractor process held the DB write lock during the complete index
    batch, which may last for several seconds, or on rare occasions even
    minutes or hours.
    
    This had several negative side effects:
    - Any filesystem changes had to be queued in the scheduler, as these
      can not be commited to the DB.
    - Even deleted files may be commited to the DB, to be immediately deleted
      when the pending event queue is processed.
    - Any search may return fairly obsolete results, including deleted files.
      (Searching may still return incorrect results for files still
      pending, but this is out of scope.)
    - When an extractor crashes, the write transaction was still open. Although
      this is detected and handled, but may still cause further problems.
    
    Create a preliminary workload which is processed without holding any
    transactions, and only create the write transaction when the content
    extraction has completed. The completed workload is then checked if it
    matches the original state (url/id), and commited. For the unlikely case
    the state has changed the mismatching document(s) is discarded.

 src/file/extractor/app.cpp | 145 ++++++++++++++++++++++++++++-----------------
 src/file/extractor/app.h   |  27 ++++++---

Qt Version: 6.10.2

ADDITIONAL INFORMATION
I suspect it’s got something to do with the changes in early January re splitting stuff into multiple transactions.
Comment 1 tagwerk19 2026-02-13 00:37:26 UTC
I see the same, with a 
    kf.filemetadata: Extracting UTF-8 "\n" plain text from "....
    ASSERT: "txn != nullptr" in file /workspace/build/src/engine/documentiddb.cpp, line 17

Seems not to be a particular file as after the crash and restart the file is indexed (according to the debug anyway). It's also possible to index the files that crash with a "balooctl6 index ..."

This is on Neon Unstable.
Comment 2 Stefan Brüns 2026-02-13 01:05:40 UTC
How about providing a backtrace?
Comment 3 Stefan Brüns 2026-02-13 04:06:03 UTC
Git commit 1a80af307dfc8ea07a2a4623a2e6078c90ecdd2b by Stefan Brüns.
Committed on 13/02/2026 at 03:48.
Pushed by bruns into branch 'master'.

[Extractor] Open the DB in ReadWrite mode from the beginning

The open mode can not be changed later, open it read-write.

M  +1    -1    src/engine/transaction.cpp
M  +2    -5    src/file/extractor/app.cpp

https://invent.kde.org/frameworks/baloo/-/commit/1a80af307dfc8ea07a2a4623a2e6078c90ecdd2b
Comment 4 tagwerk19 2026-02-13 07:40:16 UTC
Created attachment 189510 [details]
baloo_file_extractor backtrace (2026/02/13)

> How about providing a backtrace?
This is what I saw...
Comment 5 tagwerk19 2026-02-13 08:01:00 UTC
*** Bug 515651 has been marked as a duplicate of this bug. ***
Comment 6 tagwerk19 2026-02-13 08:04:57 UTC
*** Bug 515911 has been marked as a duplicate of this bug. ***
Comment 7 Kai Uwe Broulik 2026-02-13 08:06:20 UTC
I salvaged the assert from the coredump but forgot to save the trace, sorry...

With recent git master it works fine again \o/ Thanks so much for the prompt fix!
Comment 8 Nicolas Fella 2026-02-13 11:59:56 UTC
Git commit abdf26f61fd4de8637d77fd0d51b5ab0fd8b23c5 by Nicolas Fella, on behalf of Stefan Brüns.
Committed on 13/02/2026 at 11:58.
Pushed by nicolasfella into branch 'Frameworks/6.23'.

[Extractor] Open the DB in ReadWrite mode from the beginning

The open mode can not be changed later, open it read-write.
(cherry picked from commit 1a80af307dfc8ea07a2a4623a2e6078c90ecdd2b)

M  +1    -1    src/engine/transaction.cpp
M  +2    -5    src/file/extractor/app.cpp

https://invent.kde.org/frameworks/baloo/-/commit/abdf26f61fd4de8637d77fd0d51b5ab0fd8b23c5
Comment 9 Nicolas Fella 2026-02-13 12:40:29 UTC
*** Bug 515944 has been marked as a duplicate of this bug. ***