Bug 461256 - Baloo slows the system when many files are created
Summary: Baloo slows the system when many files are created
Status: RESOLVED DUPLICATE of bug 400704
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.98.0
Platform: Manjaro Linux
: NOR major
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-31 17:37 UTC by Alberto Salvia Novella
Modified: 2025-02-20 15:48 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
monitor-processes.png (72.50 KB, image/png)
2022-11-02 16:14 UTC, Alberto Salvia Novella
Details
monitor-resources.png (203.71 KB, image/png)
2022-11-02 16:15 UTC, Alberto Salvia Novella
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alberto Salvia Novella 2022-10-31 17:37:07 UTC
If you create many files at the same time, like you do when compiling software, that makes baloo to noticeably slow the system down.

This happens despite using a brand new SanDisk high speed SSD with DRAM cache. This laptop has an AMD A10-8700P APU, and 16GiB of RAM DDR3. No swap, no encryption.

Choosing a different kernel edition (generic/zen), with their different I/O schedulers (endline/bfq), doesn't make the slowness to go away. Selecting baloo not to index file contents, the same.

I think this defeats the purpose of baloo all together, as it's just better to have it switched off in any circumstance. It really needs to get easy on resources.
Comment 1 tagwerk19 2022-10-31 22:02:25 UTC
How big has your index grown? You can check with

    $ balooctl indexSize

Do you see the same happening with a simple:

    $ mkdir ~/Testdir
    $ cd ~/Testdir
    $ for i in {1..50000}; do echo "This is file $i" > file$i.txt; done

as used in Bug 437754?

Are you also deleting very many files as well as creating them?
Comment 2 Alberto Salvia Novella 2022-11-01 15:53:16 UTC
When I disabled Baloo I deleted the index.

But if I enable Baloo again, the resulting index is:

$ balooctl indexSize
File Size: 141,81 MiB
Used:      137,87 MiB

           PostingDB:      23,63 MiB    17.142 %
          PositionDB:      25,91 MiB    18.791 %
            DocTerms:      21,74 MiB    15.770 %
    DocFilenameTerms:      21,56 MiB    15.637 %
       DocXattrTerms:            0 B     0.000 %
              IdTree:       5,78 MiB     4.193 %
          IdFileName:      23,84 MiB    17.289 %
             DocTime:      14,34 MiB    10.401 %
             DocData:            0 B     0.000 %
   ContentIndexingDB:            0 B     0.000 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:       1,07 MiB     0.776 %
Comment 3 Alberto Salvia Novella 2022-11-01 16:03:18 UTC
It doesn't happen with the above commands, while baloo turned off.
Comment 4 Alberto Salvia Novella 2022-11-01 16:05:43 UTC
The initial indexing of baloo slows things more, than creating 500000 files recursively itself.
Comment 5 Alberto Salvia Novella 2022-11-01 16:18:41 UTC
Same indexing the recently created files using the above method.
Comment 6 Alberto Salvia Novella 2022-11-01 16:21:19 UTC
Compilation usually involves creating and deleting files.
Comment 7 tagwerk19 2022-11-02 07:52:13 UTC
(In reply to Alberto Salvia Novella from comment #0)
> If you create many files at the same time, like you do when compiling
> software, that makes baloo to noticeably slow the system down.
From what I've seen, you are most likely to notice the impact of baloo when it is greedy with RAM.

Looking at the options...

You can check how much CPU the "baloo_file" process is using if you look with "top" or "htop". You should see that has priority 39 (maybe listed as "nice" 19), baloo should then yield to anything else wanting to do work. You may see that it still takes 100% of a CPU, but it's using the CPU only when other processes don't need it.

If baloo_file is not running with this "idle" priority, then something is wrong.

If you see "baloo_file" repeated reappear with different process numbers, then it could be crashing and handling the crash is eating the resources.

If you are watching with "htop", you'll also see how much memory baloo_file is using.  Shouldn't really be more than 60% (if it is more, the system might be thinking of swapping or about OOM killing the process). Neither are good...

You can watch the I/O load with "iotop" (need to run this as root). You can get iotop to display the total (accumulated) writes as baloo_file writes (and commits) data to the index.

If you see the "total writes" adding up to significantly more than the size of the index, then something else is wrong.

(In reply to Alberto Salvia Novella from comment #2)
> ... if I enable Baloo again, the resulting index is:
> 
> $ balooctl indexSize
> File Size: 141,81 MiB
> Used:      137,87 MiB
So it doesn't look as if the index is large enough that even if all pages were in memory, it would cause a problem...

(In reply to Alberto Salvia Novella from comment #4)
> The initial indexing of baloo slows things more, than creating 500000 files recursively itself.
500000 files is "substantial", 50000 should be noticeable but not painful. You should be able to repeat tests with creating the files and watching what's happening with htop/iotop.

(In reply to Alberto Salvia Novella from comment #6)
> Compilation usually involves creating and deleting files.
I asked about file deletions as deletions seem to load baloo far more than file creations. Sometimes better to "balooctl purge" and let the index rebuild than wait for baloo to delete many thousand files. Bug 437754 can be used in evidence of this...

You might find baloo trying to keep up with all the temp files created and deleted, if you are regularly building/rebuilding software you might think of excluding your working directory from indexing. That said, if baloo is not content indexing, you should scarcely notice it running.
Comment 8 Alberto Salvia Novella 2022-11-02 16:14:44 UTC
Created attachment 153407 [details]
monitor-processes.png
Comment 9 Alberto Salvia Novella 2022-11-02 16:15:08 UTC
Created attachment 153408 [details]
monitor-resources.png
Comment 10 Alberto Salvia Novella 2022-11-02 16:43:58 UTC
"iotop" shows no IO activity from baloo_file. Only from time to time. It seems that most of its processing happens on RAM.

It doesn't seem to be CPU Cores, RAM or IO saturation. What happens is that when "baloo_file" uses 25% of CPU, one core of four, the graphical interface sees a drop in framerate. Like going from 120 to 15 permanently till "baloo_file" stops.

This is probably related with the fact that APUs share memory between CPU, GPU and RAM. So most likely some shared memory between those is saturated, not in amount, but in frequency. Context switching.
Comment 11 tagwerk19 2022-11-03 20:06:52 UTC
(In reply to Alberto Salvia Novella from comment #10)
> It doesn't seem to be CPU Cores, RAM or IO saturation. What happens is that
> when "baloo_file" uses 25% of CPU, one core of four, the graphical interface
> sees a drop in framerate. Like going from 120 to 15 permanently till
> "baloo_file" stops.
I had a go at installing a Manjaro system, just in case there is anything out of the ordinary with it, and did the test of creating 50,000 files.  I ran it as a guest VM under KVM with similar number of CPU's and amount of RAM as your system and didn't hit any surprises. I didn't see any changes in frame rate (it was stable at 60 fps) but the setups are probably far too different to show anything.

> This is probably related with the fact that APUs share memory between CPU,
> GPU and RAM. So most likely some shared memory between those is saturated,
> not in amount, but in frequency. Context switching.
One alternative thought, assuming the slow down / speed up happens a little after baloo_file starts or stops working. Might the CPU throttle back when it has to work hard and heats up? (See Bug 453968)
Comment 12 Alberto Salvia Novella 2022-11-03 21:52:18 UTC
It happens instantly after Baloo starts indexing, and the performance drop stays stable.

Furthermore my CPU thermal grease has been replaced by high quality one, and the laptop fan barely switches on compared with before. Usually staying between 40-50ºC. Hence I don't think it is due to thermal issues.

My guess is that either the APU, or the motherboard, is the limiting factor. I have seen that unbuffered SSDs don't perform so well on this laptop, where they do in older machines I have tried them on. Some path becomes too busy.
Comment 13 Alberto Salvia Novella 2022-11-04 05:05:50 UTC
Just you to get the idea: when I convert 1080p using all cores I cannot perceive any system slowdown. When I compile C++ code I can only perceive a small one.

Baloo seems to be the sole program that causes a real slowdown, where even other high demanding tasks don't.
Comment 14 tagwerk19 2022-11-04 06:47:12 UTC
(In reply to Alberto Salvia Novella from comment #13)
> My guess is that either the APU, or the motherboard, is the limiting factor.
> I have seen that unbuffered SSDs don't perform so well on this laptop, where
> they do in older machines I have tried them on. Some path becomes too busy.
> ...
> Baloo seems to be the sole program that causes a real slowdown, where even
> other high demanding tasks don't.
LMDB, the database engine that baloo is built on, uses memory-mapped I/O extensively. It's possible that this is what's unusual in this case, we would need some test code that created / stressed an LMDB database to find out more...

Sorry. Beyond that, I fear I'm out of ideas.
Comment 15 Alberto Salvia Novella 2022-11-04 09:32:31 UTC
Thanks for the help. Let me know if you need more testing.
Comment 16 tagwerk19 2024-07-03 06:57:16 UTC
(In reply to Alberto Salvia Novella from comment #15)
> Thanks for the help. Let me know if you need more testing.
There was a patch here:
    https://invent.kde.org/frameworks/baloo/-/merge_requests/148
that fixes the problem of Baloo "running out" of memory when it starts up (or maybe after a "balooctl check"); when it is collecting information about all new/changed files anyway. Previously this info was collected and written to the index in one transaction, now it is split into multiple small transactions.

It does not directly match your situation, but maybe it has an impact.
Comment 17 postix 2025-02-20 15:48:07 UTC
*** This bug has been marked as a duplicate of bug 400704 ***