Bug 443547 - Baloo_file_extractor uses all available memory and never finishes running
Summary: Baloo_file_extractor uses all available memory and never finishes running
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.86.0
Platform: Manjaro Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
: 447681 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-10-10 12:55 UTC by lamblord282
Modified: 2022-01-14 08:42 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
screenshot of system monitor showing high memory usage (89.20 KB, image/png)
2021-10-10 12:55 UTC, lamblord282
Details

Note You need to log in before you can comment on or make changes to this bug.
Description lamblord282 2021-10-10 12:55:41 UTC
Created attachment 142299 [details]
screenshot of system monitor showing high memory usage

SUMMARY
I recently downloaded a backup of my gmail account-- a large .mbox file of around 2.8GB. After that time I noticed performance stuttering, intermittently severe and causing the screen and cursor to lock up for a second or two at a time, and high memory usage-- consuming all 8GB of my laptop's memory even with no applications running. 

System monitor suggests that Baloo_file_extractor is taking up all available memory at any given time. CPU usage by baloo_file_extractor appears to range typically between 4-12.5% (Intel i5-8250U (8) @ 3.400GHz) and memory usage easily greater than 5GB depending on the number of other applications running. running "balooctl status" gives the following output:

Baloo File Indexer is running
Indexer state: Indexing file content
Indexing: /home/.../All mail Including Spam and Trash-003.mbox
Total files indexed: 28,114
Files waiting for content indexing: 4
Files failed to index: 0
Current size of index is 526.02 MiB

Believing that the large file was taking a while to index and would eventually finish on its own, I allowed it to run in the background, but after a day and a half since I initially noticed the issue, it is still running and shows that it is still stuck on this file. (I would attach it, but sharing a file with all of my emails in it is a major concern). Restarting the PC appears to have no effect.

STEPS TO REPRODUCE
1. Appears to start from the moment the OS loads
2. Possibly the result of a large .mbox file on my system
3. Suspending and disabling balooctl, then restarting the machine, caused memory usage to return to normal
4. enabling balooctl again caused baloo_file_extractor to start and take up all available memory

OBSERVED RESULT
High memory usage and noticeable OS performance issues. Indexer does not finish after many hours of uptime.

EXPECTED RESULT
No OS stuttering while running in background.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 5.10.70-1-MANJARO (64-bit)
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.86.0
Qt Version: 5.15.2
Comment 1 tagwerk19 2021-10-10 21:43:46 UTC
I'm guessing you want to exclude mbox files from baloo's indexing?

You'd need to edit the .config/baloofilerc file and append .mbox to the list of "exclude filters"

Not sure what you'd get if you did content index an mbox file, you wouldn't be able to search for individual messages. You could quite easily be indexing loads of encoded text ...

    ... 4gk6hwd4pj1m3s1uwxfxrlum 4gk75bsr 4gk7e3uirqbj7erzingvztwru
    4gk7kt73brlfhzyk0ybst9qb6 4gk7pyzee5h34885h94f564r0 4gk7wg
    4gk7zukfhmtso51zbvb 4gk88ku5 4gk8oagjt81vjgsqbnj1hvxat ...

which would do mad things to your index and memory usage. Flagging as 'Confirmed
Comment 2 tagwerk19 2021-10-10 21:49:13 UTC
Interesting aside ...

... initially tried the test on a Fedora 35 box; the OOM (Out of Memory) protection killed both baloo_file_extractor and baloo_file itself. No stuttering but no indexing either.
Comment 3 lamblord282 2021-10-11 00:56:07 UTC
I added *.mbox to the exclude list, and that appears to have worked. Indexing is now enabled and system performance/memory usage appears normal. The status now shows 0 files waiting to index.

Excluding .mbox files from the list is fine by me, and I have no need to make it content searchable. I had not even been thinking about indexing at the time until I noticed my system was behaving strangely. Thanks!
Comment 4 tagwerk19 2021-10-11 07:05:27 UTC
There may be a case to add .mbox to the default exclude list...

... Will flag as "Resolved"
Comment 5 tagwerk19 2022-01-14 08:42:39 UTC
*** Bug 447681 has been marked as a duplicate of this bug. ***