Bug 442453

Summary: Significant amount of disk write in short time after deleting many files in monitored folder.
Product: [Frameworks and Libraries] frameworks-baloo Reporter: cantfind
Component: Baloo File DaemonAssignee: baloo-bugs-null
Status: CONFIRMED ---    
Severity: major CC: heri+kde, tagwerk19
Priority: NOR    
Version: 5.85.0   
Target Milestone: ---   
Platform: Manjaro   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=437754
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description cantfind 2021-09-14 21:17:21 UTC
SUMMARY

I was deleting quite a bit of files from my home folder (not adding anything, not moving anything - deleting with ctrl+shift+del so not even moving to recycle bin). 

baloo_file process consumed about 50% of a single core, and wrote 20-100 Mega Bytes per second for at least several minutes, reaching a size of index of more than 6GB.

Not sure I can reproduce it again, though.


STEPS TO REPRODUCE
1. delete files from monitored folder, or move files to a different (unmonitored disk)?

OBSERVED RESULT

baloo_file process wrote a lot of data, and used up a lot of CPU for a while.
Even though balooctl status reported it was idle.

The balooctl status command took about half a minute to print the data, and balooctl monitor didn't show any activity while running from a different tab the status command.


EXPECTED RESULT

To not increase the index size while indexed files decrease...


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Manjaro
(available in About System)
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.85.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION


here are the results of 2 subsequent runs of balooctl status:

Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 199,551
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 5.95 GiB


Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 199,109
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 6.42 GiB

Even though, the time between running these two commands was about a minute, the size of index grew by  about 500MB, while the Total files indexed number went down...

Here's the output of balooctl monitor:

Press ctrl+c to stop monitoring
File indexer is running
Idle
Comment 1 cantfind 2021-09-14 21:20:50 UTC
Killing the baloo_file process, and than starting indexing again "fixed" it - baloo_file no longer used up cpu and no longer increased the index size.

But now my index size is quite gigantic.
Comment 2 cantfind 2021-09-14 21:37:49 UTC
I was able to reproduce it again. All it takes is deleting a lot of files in Dolphin from a monitored folder. (The ones I deleted now were archived books).
Comment 3 cantfind 2021-09-14 21:39:05 UTC
now my balooctl status reports:

Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 198,471
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 7.20 GiB
Comment 4 tagwerk19 2021-09-15 08:15:10 UTC
Removing entries seems to be hard...

I think that baloo_file does not "batch up" deletes in the same way as it batches up its content indexing. I've tried watching with iotop. 

There seems to be a "gotcha" if you try to check progress with "balooctl status" - ref Bug 437754. Could be where you get your 6GB index...
Comment 5 cantfind 2021-09-15 13:41:15 UTC
(In reply to tagwerk19 from comment #4)
> Removing entries seems to be hard...
> 
> I think that baloo_file does not "batch up" deletes in the same way as it
> batches up its content indexing. I've tried watching with iotop. 
> 
> There seems to be a "gotcha" if you try to check progress with "balooctl
> status" - ref Bug 437754. Could be where you get your 6GB index...

It's not just wrong reporting on balooctl... ksysguard shows a lot of writing going on in baloo_file process, and quite a bit of cpu usage too.
Comment 6 cantfind 2021-09-15 13:42:18 UTC
RAM usage was also at ~5GB...
Comment 7 tagwerk19 2021-09-20 18:13:37 UTC
(In reply to cantfind from comment #0)
> OBSERVED RESULT
> 
> baloo_file process wrote a lot of data, and used up a lot of CPU for a while.
> Even though balooctl status reported it was idle.
> 
> The balooctl status command took about half a minute to print the data, and
> balooctl monitor didn't show any activity while running from a different tab
> the status command.
Flagging "Confirmed"...