Bug 498876 - baloo_file_extractor uses a significant amount of system resources; balooctl6 monitor simply shows ": Ok"
Summary: baloo_file_extractor uses a significant amount of system resources; balooctl6...
Status: RESOLVED WORKSFORME
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (other bugs)
Version First Reported In: 6.10.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-01-19 06:10 UTC by A. D. Cramer
Modified: 2025-02-20 03:46 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description A. D. Cramer 2025-01-19 06:10:10 UTC
SUMMARY
The baloo_file_extractor process uses progressively larger amounts of memory (currently ~700MiB on finishing writing this report, was ~500MiB on start of writing this report), and is writing a significant amount of data to an HDD, making the entire system slow. balooctl6 monitor simply prints ": Ok" forever.

STEPS TO REPRODUCE
Should note that I am not certain that these are the steps that cause this, but I'd assume it is as this issue appeared shortly after doing these steps.
1. Copy the entire /home directory to a folder on a different drive.
2. Make the main partition on that drive mount as /home, moving some directories around to make this feasible
3. Switch the distribution, overwriting everything in the system partition
4. Use the computer as normal. As time goes on, baloo_file_extractor will progressively take more resources.

OBSERVED RESULT
baloo_file_extractor slowly takes more and more resources, and ": Ok" is the only thing shown in the monitor.

EXPECTED RESULT
baloo_file_extractor would use roughly the same amount of resources as it indexes

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch Linux
KDE Plasma Version:  6.2.5
KDE Frameworks Version: 6.10.0
Qt Version: 6.8.1

ADDITIONAL INFORMATION

Here is the output of balooctl6 status:
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 5,551,335
Files waiting for content indexing: 1,568,892
Files failed to index: 0
Current size of index is 7.79 GiB

...and balooctl6 running for one minute
Press ctrl+c to stop monitoring
File indexer is running
Indexing file content
: Ok
- 358 repeating lines removed for brevity -
: Ok
^C
Notably, less lines are printed when making it output to a file instead of just outputting to the console directly.

Bug 400704 seems related, but not the same, as that bug had high CPU usage as well as high disk usage, while this has high RAM usage instead. Bug 354636 also seems similar, however the trigger for that was instead updating to framework version 5.80.0, while I have been using 6.10.0 for several days now with no issues. The only change in baloo in 6.10.0 was removing an unused member, so I doubt that update is causing it.
Comment 1 tagwerk19 2025-01-19 06:31:52 UTC
(In reply to A. D. Cramer from comment #0)
> 1. Copy the entire /home directory to a folder on a different drive.
> 2. Make the main partition on that drive mount as /home, moving some
> directories around to make this feasible
You are reindexing everything... If you've copied to a different drive the files will have a different FilesystemID/inode and Baloo will consider them "new".

> Here is the output of balooctl6 status:
> Baloo File Indexer is running
> Indexer state: Indexing file content
> Total files indexed: 5,551,335
> Files waiting for content indexing: 1,568,892
You've got a lot of files, Baloo is working through them.

All the same, it would be best to delete the index and start again (pkill baloo_file; balooctl6 purge), Baloo is remembering everything it indexed on its "old" disc. You don't need that and it will make the indexing slower and more disk and RAM intensive. The index file itself if pretty large...

> ...and balooctl6 running for one minute
> Press ctrl+c to stop monitoring
> File indexer is running
> Indexing file content
> : Ok
> - 358 repeating lines removed for brevity -
> : Ok
> ^C
Could be you've got loads of code in your $HOME. The monitor is not so good at telling the Baloo has looked at a file and decided it's on the mimetype exclusion list.
Comment 2 tagwerk19 2025-01-20 09:58:58 UTC
(In reply to tagwerk19 from comment #1)
> ... it would be best to delete the index and start again ...
Did you have any luck here?

> ... Could be you've got loads of code in your $HOME ...
It would be interesting to know... C, C++ code or whatever.

You can get a list of the mimetype exclusions with:
    balooctl6 config list excludeMimetypes | sort
Comment 3 A. D. Cramer 2025-01-21 14:50:03 UTC
(In reply to tagwerk19 from comment #2)
> (In reply to tagwerk19 from comment #1)
> > ... it would be best to delete the index and start again ...
> Did you have any luck here?
> 
> > ... Could be you've got loads of code in your $HOME ...
> It would be interesting to know... C, C++ code or whatever.
> 
> You can get a list of the mimetype exclusions with:
>     balooctl6 config list excludeMimetypes | sort

Apologies about the late reply, I have been quite busy for the last few days. Deleting the index did help quite a bit, there are now 1 million less files in the index and only 300 thousand waiting for indexing. I have quite a few Git repos in ~/git, along with quite a few on a drive specifically for them, so that may be what code is in $HOME. Most of it is Java, however there is also a rather substantial amount of C in there as well, along with some Python, CSS, and PHP. Memory usage is still quite high (now surpassing 1GiB), however drive usage has gone down quite a bit, and the system is no longer slowing down because of it. I think the main bug here may just be the ambiguous logging of excluded mimetypes in the monitor.
Comment 4 Bug Janitor Service 2025-02-05 03:47:05 UTC
🐛🧹 ⚠️ This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information, then set the bug status to REPORTED. If there is no change for at least 30 days, it will be automatically closed as RESOLVED WORKSFORME.

For more information about our bug triaging procedures, please read https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging.

Thank you for helping us make KDE software even better for everyone!
Comment 5 Bug Janitor Service 2025-02-20 03:46:50 UTC
🐛🧹 This bug has been in NEEDSINFO status with no change for at least 30 days. Closing as RESOLVED WORKSFORME.