Bug 488645 - baloo_file_extractor high CPU usage, baloo stops indexing
Summary: baloo_file_extractor high CPU usage, baloo stops indexing
Status: REPORTED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 6.3.0
Platform: Fedora RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-06-17 23:34 UTC by skierpage
Modified: 2025-03-24 06:42 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description skierpage 2024-06-17 23:34:38 UTC
SUMMARY
Baloo works OK for a few days then consumes a lot of CPU and stops indexing new and changed files.

STEPS TO REPRODUCE
1. Wait for system fans to turn on.
2. Run top and ps, notice baloo_file_extr high CPU use
3. Look in journalctl
4. Run `balooctl6 monitor` in a terminal.
5. Create or modify a text file in an indexed location.

OBSERVED RESULT

baloo_file_extractor is consuming high CPU
[spage@fedlaptop]/tmp% ps alx -w -w | rg 'PID|baloo'             
F   UID     PID    PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000    1831    1533  39  19 269117376 2660 futex_ SNsl ?        1:28 /usr/libexec/kf6/baloo_file
0  1000  266144    1831  39   - 269599760 70864 folio_ DNl ?       10:34 /usr/libexec/kf6/baloo_file_extractor

journalctl has no log output from baloo (I turned on kfilemetadata and baloo logging in ~/.config/QtProject/qtlogging.ini). The last warnings are from around the time the fans turned on:
...
Jun 17 15:49:10 fedlaptop baloo_file_extractor[266144]: kf.filemetadata: Searching for external extractors: "/usr//usr/libexec/kf6/kfilemetadata/externalextractors"
Jun 17 15:49:10 fedlaptop baloo_file_extractor[266144]: kf.idletime: Could not find any system poller plugin
Jun 17 15:49:10 fedlaptop baloo_file_extractor[266144]: qt.core.qobject.connect: QObject::connect(KAbstractIdleTimePoller, KIdleTime): invalid nullptr parameter
Jun 17 15:49:10 fedlaptop baloo_file_extractor[266144]: qt.core.qobject.connect: QObject::connect(KAbstractIdleTimePoller, KIdleTime): invalid nullptr parameter

Eventually my fan shuts off and my system is responsive (baloo runs at low priority), but baloo CPU usage remains 40-60% if I'm not active in other programs.

balooctl6 monitor doesn't report indexing of new and modified files.

EXPECTED RESULT
CPU usage stays low.
baloo continues to index new and changed files.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:
KDE Plasma Version:  6.0.5
KDE Frameworks Version: 6.3.0
Qt Version: 6.7.1 (Wayland)

ADDITIONAL INFORMATION
The KDE warning in the journal could be irrelevant.

Unfortunately I can't tell what file baloo_file_extractor is indexing, if any. It doesn't show up in `lsof -p <PID_OF_BALOO_FILE_EXTRACTOR>`, I think because baloo_file and baloo_file_extractor communicate over a pipe. I'm not sure what log output wil get  baloo_file_extractor to report what it's doing. `strace` outputs nothing for a while, then a lot of lseek/writev/write64 activity on fd 6, which is my ~/.local/share/baloo/index, a 5 GB file on a btrfs volume.

I don't know what else to do to debug this.
Comment 1 tagwerk19 2024-06-18 21:15:21 UTC
Not many leads here are there...

My guess is that Baloo is working within its memory "cap" (it has a 512MB limit defined in its systemd unit file) but the index has grown far too much. It will struggle when reading - it will read scattered pages from the index, have to repeatedly drop "clean" pages to read another ones, repeat and repeat. Means loads of I/O.

When indexing you have this behaviour and a possibly a gradually increasing number of dirty pages that cannot be dropped. That may push Baloo to start swapping. That is *bad*.

This has to be a guess from your description. A 6GB index seems large. You could see what
     systemctl --user status kde-baloo
says and you could watch what's happening with I/O with iotop. Maybe try increasing the 512MB limit (MemoryHigh) to something like 25% (it's a bit of a 'pick a number'), which should allow Baloo to make better use the RAM. As a separate step, perhaps afterwards, you could set the MemorySwapMax to zero (means that if you reach the limit, Baloo will be killed OOM rather than running your system into the mud). You can edit these settings with:
     systemctl --user edit kde-baloo
Comment 2 Fieldservice4 2024-06-25 17:51:20 UTC
I am linking https://bugs.kde.org/show_bug.cgi?id=446071

For both my laptops, "baloo_file_extractor" went crazy after copying over 200gb of files from my Windows session. The files are all kinds of .img (gps maps), zip files, Ms Office files etc.

Laptop 1 had 32gb in memory and 2tb SSD, and Laptop 2 16gb in memory and 512gb SSD. Both systems rendered inferior to Windows 11 (massive loss in battery time) due to "baloo_file_extractor" which I had to turn off after 10 hours of constant processing.
Comment 3 tagwerk19 2024-06-25 20:04:10 UTC
(In reply to Fieldservice4 from comment #2)
> ... For both my laptops, "baloo_file_extractor" went crazy after copying over
> 200gb of files from my Windows session ...
See: https://bugs.kde.org/show_bug.cgi?id=446071#c20

But also it would be interesting to see if Baloo was picking up files it should not index. I remember there have been bugs about it stumbling on Wine folders.

I think first make sure it is not running the content indexer when on battery then watch what it is indexing with "balooctl monitor" (maybe "balooctl6 monitor") when you are on mains power.


. The files are all kinds of .img (gps
> maps), zip files, Ms Office files etc.
> 
> Laptop 1 had 32gb in memory and 2tb SSD, and Laptop 2 16gb in memory and
> 512gb SSD. Both systems rendered inferior to Windows 11 (massive loss in
> battery time) due to "baloo_file_extractor" which I had to turn off after 10
> hours of constant processing.
Comment 4 ghost.carpentry217 2024-11-17 02:14:13 UTC
Has there been any progress on this issue? I had a similar issue, where the baloo_file process was taking around 96% of my cpu even though the file search screen in the settings said that the indexer was paused. I could get around this by simply killing the process, and now that it says the index is 100% complete, I haven't encountered the issue again, but at the time this problem did cause my laptop to noticeably heat up. Let me know if there is any solution for this!

System Details

Operating System: Fedora Linux 40
KDE Plasma Version: 6.2.3
KDE Frameworks Version: 6.8.0
Qt Version: 6.7.2
Kernel Version: 6.8.5-301.fc40.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 7840U w/ Radeon 780M Graphics
Graphics Processor: AMD Radeon 780M
Manufacturer: Framework
Product Name: Laptop 13 (AMD Ryzen 7040Series)
System Version: A7
Comment 5 skierpage 2025-03-24 06:42:57 UTC
This still happens. I still can't see any file being indexed in /proc/fd or `lsof -p NNNN`.

I ran strace and baloo_file_extractor is every 15 seconds or so doing lseek(), writev(), and pwrite64()s to  FD 16, which is its $HOME/.local/share/baloo/index file. My baloo index file is 5.5GB.