SUMMARY STEPS TO REPRODUCE 1. git clone https://github.com/NixOS/nixpkgs 2. balooctl check 3. balooctl monitor OBSERVED RESULT balooctl monitor shows the same files will keep be indexed again and again,and balooctl status shows no reduction in the number of files to be indexed EXPECTED RESULT balooctl status shows progressively less files to be indexed SOFTWARE/OS VERSIONS Operating System: NixOS 20.03pre198214.4cd2cb43fb3 KDE Plasma Version: 5.16.5 KDE Frameworks Version: 5.62.0 Qt Version: 5.12.5 Kernel Version: 5.2.21 OS Type: 64-bit Processors: 4 × Intel® Core™ i7-5500U CPU @ 2.40GHz Memory: 15,6 GiB ADDITIONAL INFORMATION
Having the exact same issue in nixos-19.03. But for me baloo is stuck while indexing a custom download of the tor browser in the background. Also baloo has a memory leak while doing this which makes it consume several gigabytes of memory after a while. This is the only reason i noticed the problem. I never touched baloo manually. I didn't even know what it is until 15 minutes ago when i decided to finally start debugging this crazy thing called 'baloo_file_ext' that i have to kill over and over again since days to not freeze my system. Please tell me how i can help to debug this. my baloo monitor looks like this: $ balooctl monitor Press ctrl+c to stop monitoring File indexer is running Indexing file content Indexing: /home/grmpf/synced/programs/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/obfs4proxy: Ok Indexing: /home/grmpf/synced/programs/tor-browser_en-US/start-tor-browser.desktop: Ok Indexing: /home/grmpf/synced/programs/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/obfs4proxy: Ok Indexing: /home/grmpf/synced/programs/tor-browser_en-US/start-tor-browser.desktop: Ok Indexing: /home/grmpf/synced/programs/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/obfs4proxy: Ok Indexing: /home/grmpf/synced/programs/tor-browser_en-US/start-tor-browser.desktop: Ok ... It always shows indexing the same to files over and over again.
Are there any messages in the journal?
Dear Bug Submitter, This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information. For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging If you have already provided the requested information, please mark the bug as REPORTED so that the KDE team knows that the bug is ready to be confirmed. Thank you for helping us make KDE software even better for everyone!
I've switched away from KDE to i3 in the meantime, so i cannot tell. I'll just link you the issue where i was originally coming from: https://github.com/NixOS/nixpkgs/issues/63489 Maybe the problem is related to btrfs (and it's autdefrag option)? But even if it's unknown why baloo rescans the same files over and over again, it still should not memory leak. So there are 2 bugs in one. The memory leaking should be fixable without knowing why it loops.
New information was added with comment 4; changing status for inspection.
Bug reporter has *not* provided the information asked for, and obviously has no interest to help.
I observe the same / similar issue with btrfs for /home. I've about 20k files, mostly PDFs, to be indexed and when all files are indexed baloo starts over again in an endless loop. In /var/log/messages, I see lots of baloo_file_extractor / kf.baloo messages "id seems to have changed. Perhaps baloo was not running, and this file was deleted + re-created" messages as reported in https://github.com/NixOS/nixpkgs/issues/63489#issuecomment-563007599 @Stefan: Is /var/log/messages the "journal" you are referring to? I am using the default KDE of openSUSE Leap 15.3. According to Kontact > Help > About Kontact > Libraries, it uses "KDE Frameworks 5.76.0" and "Qt 5.12.7 (built against 5.12.7)". I did not have this problem using XFS for /home on the same OS, so I concur with comment 4 that this may be specific to using btrfs. I am not too worried about the CPU usage but the index growing by a few GB in every round is a problem for me. If the issue cannot easily be fixed I'd therefore welcome a partial solution at least avoiding large index updates. This could for example be implemented by recording the sha256 fingerprint of every indexed file and only indexing the contents of files with a new sha256 fingerprint and linking files with the same content in the baloo database.
Re the idea of recording the sha256 of each file, this may be problematic for large files with only a small content area such as meta data and subtitles of a video. Still, reading excessive amounts of data can be preferable over writing excessive amounts of index data. A solution may be to require the content indexer modules to support returning a content fingerprint, with the default implementation running the normal content extraction and calculating a fingerprint over the extracted content. File-format-specific implementations can skip some processing steps such as decompression of a data stream and character set conversion.
(In reply to Joachim Wagner from comment #7) > I observe the same / similar issue with btrfs for /home ... > ... openSUSE Leap 15.3 ... Yes, there's an issue with openSUSE, BTRFS and multiple subvols, as per: https://bugs.kde.org/show_bug.cgi?id=402154#c12 I'm not sure this would explain the original issue though, of same files being indexed in a loop. Might be worthwhile checking whether the baloo_file_extractor process is crashing.
Yes, thanks. My logs confirm the re-indexing co-occurs with the use of a new virtual device number for the btrfs filesystem.