Bug 488446 - Baloo idles at 72% completion, using 37% CPU
Summary: Baloo idles at 72% completion, using 37% CPU
Status: RESOLVED MOVED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 6.2.0
Platform: NixOS Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-06-13 12:08 UTC by contact
Modified: 2024-06-17 11:44 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description contact 2024-06-13 12:08:53 UTC
SUMMARY
(I am going to skip the steps to reproduce, as that does not help with this)

In my system settings under file search, it says the following:
Status: Indexing file content, 72% complete
Currently Indexing: "Idle"

Despite this, Baloo consistently uses 37% CPU.

SOFTWARE/OS VERSIONS
Operating System: NixOS 24.11
KDE Plasma Version: 6.0.5
KDE Frameworks Version: 6.2.0
Qt Version: 6.7.1
Kernel Version: 6.9.3 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
Memory: 27.2 GiB of RAM
Graphics Processor: AMD Radeon Graphics
Comment 1 contact 2024-06-13 12:57:54 UTC
some ADDITIONAL CONTEXT
 - I use an encrypted hard drive
 - I use btrfs
 - I use syncthing
 - all baloo settings are defaut. It just searches my home directory
 - I synced a huge pile of files when I set up my device. 230 gigs and 500K Files

But I mean a bulk of it is steam games, so it should be hidden.
I have cloned a few larger FOSS Projects to make contributions but that should not be an out-of-this world use case. I think the worst one is nixpkgs or Godot.

Other than this the most of it is just blender project files.

Oh and I use a lot of git LFS ... that does a lot of weird shenanigans with files in the background, but then again ... git lfs files should be hidden.
Comment 2 contact 2024-06-13 14:22:34 UTC
For some reason, the index it generated is 6.8GB

I will make a copy of it somewhere and then just rebuild with only file names.
Comment 3 contact 2024-06-13 14:30:42 UTC
Finished building index with only file names in seconds and the index is 84 MiB

Hope this helps.
Comment 4 tagwerk19 2024-06-13 16:55:00 UTC
(In reply to contact from comment #1)
> some ADDITIONAL CONTEXT
>  - I use an encrypted hard drive
>  - I use btrfs
As a sanity check...
... if you search for a file you know you've indexed (with baloosearch -i one-of-your-files"), do you get just a single or multiple results?

>  - I use syncthing
That watches for changes using iNotify? Baloo also uses iNotify and there's a limit to number of iNotify watches that can be set up. That may or may not be an issue.

>  - all baloo settings are defaut. It just searches my home directory
Are the Steam games and cloned projects under your $HOME? If you don't want them indexed you might need to explicity exclude the folders.

As in Bug 488439, check "systemctl --user status kde-baloo" and see if it is hitting the memory limit

Baloo's been given a 512MB cap in RAM usage, it may be that that is not enough. If Baloo is trying to swap in pages from a 6.8GB index into 512MB of space, it might be reading, dropping, re-reading, dropping pages or, worse, starting to swap when it is indexing.

You can override the unit file RAM constraint with "systemctl --user edit kde-baloo" and add, for example, a "MemoryHigh=25%" line
Comment 5 contact 2024-06-14 07:29:21 UTC
(In reply to tagwerk19 from comment #4)
> ... if you search for a file you know you've indexed (with baloosearch -i
> one-of-your-files"), do you get just a single or multiple results?
When  I run this command I get results along those lines:
Elapsed: 3.15746 msecs

When I search using my file explorer, I do not get multiple results.

> Are the Steam games and cloned projects under your $HOME? If you don't want
> them indexed you might need to explicity exclude the folders.

I mean the steam folder is hidden so it should not be indexed, but yeah I know you can explicitly exclude directories.
 
> As in Bug 488439, check "systemctl --user status kde-baloo" and see if it is
> hitting the memory limit

Yes, it is hitting the memory limit.
Comment 6 contact 2024-06-14 08:00:31 UTC
> You can override the unit file RAM constraint with "systemctl --user edit
> kde-baloo" and add, for example, a "MemoryHigh=25%" line

Well this is the joy of using stupid things like nixos, because no you can't do it that way and there is no documentation about how to actually configure it. I will be back when I found out how to do that.
Comment 7 contact 2024-06-14 09:48:22 UTC
> You can override the unit file RAM constraint with "systemctl --user edit
> kde-baloo" and add, for example, a "MemoryHigh=25%" line

Okay the effect of giving it 10 GB instead of 512 GB is it is now stuck at 73% completion instead of 72% and the index has increased in size from 8GB to 12GB.
Comment 8 contact 2024-06-14 09:49:15 UTC
*512 MB
Comment 9 contact 2024-06-14 09:56:14 UTC
I have a suspected culprit of the bloated index btw:
There are 9GB of text only files in my kitbash collection alone.
I also have several .svgs that are of considerable size.

If it is trying to index all text files no matter what, I think a lot of artists will sooner or later run into this issue.
Comment 10 tagwerk19 2024-06-14 19:34:31 UTC
(In reply to contact from comment #5)
> When  I run this command I get results along those lines:
> Elapsed: 3.15746 msecs
> 
> When I search using my file explorer, I do not get multiple results.
A little confused....

... The baloosearch didn't give you a filename, just the elapsed time? That implies it's not found a match.

Dolphin can do it's own "there and then" search (not asking Baloo for results) if you are searching "from" a folder that Baloo has not indexed. It is not particularly easy to see whether Dolphin is using Baloo or not :-/
Comment 11 tagwerk19 2024-06-14 19:40:09 UTC
(In reply to contact from comment #6)
> Well this is the joy of using stupid things like nixos, because no you can't
> do it that way and there is no documentation about how to actually configure
> it. I will be back when I found out how to do that.
Ah... I have a flake/module that's probably outdated (won't be Plasma5 now...)

{ config, pkgs, ... }:

{
  environment.systemPackages = with pkgs; [
    libsForQt5.baloo
    libsForQt5.baloo-widgets
  ];

  systemd.user.services.kde-baloo = {
    description = "Baloo File Indexer Daemon";
    partOf = [ "graphical-session.target" ];
    serviceConfig = {
      ExecStart = "${pkgs.plasma5Packages.baloo}/libexec/baloo_file";
      ExecCondition = "${pkgs.plasma5Packages.plasma-workspace}/bin/kde-systemd-start-condition --condition \"baloofilerc:Basic Settings:Indexing-Enabled:true\"";
      BusName = "org.kde.baloo";
      Slice = "background.slice";
      CPUQuota = "100%";
      CPUWeight = 1;
      IOWeight = 1;
      MemoryHigh = "50%";
      MemorySwapMax = 0;
    };
    wantedBy = [ "graphical-session.target" ];
  };
}
Comment 12 tagwerk19 2024-06-14 19:45:53 UTC
(In reply to contact from comment #9)
> I have a suspected culprit of the bloated index btw:
> There are 9GB of text only files in my kitbash collection alone.
kitbash?

It is to be expected that if you are indexing a "shed load" of text, you're asking a lot of the index. I find myself wondering though whether the "73% completion" is real and how the maths is done...

> I also have several .svgs that are of considerable size.
That should not be an issue, assuming they have a Mime type of "image/svg+xml". You can see what Baloo has indexed for a file with:
    balooshow -x "one-of-your-images.svg"      
This, of course, may be balooshow6 for your chosen system...

> If it is trying to index all text files no matter what, I think a lot of
> artists will sooner or later run into this issue. 
I think that's what Baloo thinks it ought to do - index the text files (and anything else it can extract the "plain text" from).

You can exclude folders from the indexing or turn off "content indexing" if you just was to index filenames (and tags)

If you have very specific file types that are not really text but give a 'text/plain' when you ask for the MIME type you could exclude the files "by extension" or, for serious magicians, creating or overriding a Mime type description. See what kmimetypefinder says about some of your "kitbash" files..
Comment 13 contact 2024-06-15 10:25:35 UTC
> kitbash?
Using libraries of simple 3D shapes to cobble together complex models.
Kitbash libraries can be very huge and this one is obj, so plain text 3D models.

> You can exclude folders from the indexing or turn off "content indexing" if
> you just was to index filenames (and tags)

Sorry, no. We have not gotten to the bottom of this yet. It might be a very weird thing I do with my files and if that is the case, I am happily going to resolve this on my side.
But: if this turns out to be a matter of baloo not being able to handle a kitbash library, this is your issue. I am not going to configure a utility whose name I will likely never even find without looking at the command line to have files on my operating system without wasting tons of compute and storage on a bloated index. Having files on your operating system is something that just needs to work out of the box without causing a fuzz.

Speaking of wasting compute and storage.

I have looked into my desktop computer. Turns out baloo has been idling at 100%CPU and has thus far managed to create a 42 GB Index of 83% of my file system. This must have been wasting an entire core for years. And I have always wondered, why my Fans never stop spinning.

 I will try to dig around with balooctl to see if I can make sense of this. And I will try to ... create a test environment to replicate this issue.

And I mean ... because I can simply turn off the context based search entirely and not have this issue, it is not too critical, but I think I will probably need some guidance here.
Comment 14 contact 2024-06-15 11:13:18 UTC
(In reply to tagwerk19 from comment #11)

Thanks.
This is the solution I found:

> systemd.user.services.kde-baloo.serviceConfig = {MemoryHigh = "40%";};
> systemd.user.services.kde-baloo.overrideStrategy = "asDropin";
Comment 15 contact 2024-06-15 11:14:52 UTC
I am right now here so feel free to hop in to help trouble shoot if you want.
I feel really overwhelmed right now.

https://meet.ffmuc.net/baloo-baloon-hunt
Comment 16 contact 2024-06-15 12:05:12 UTC
(In reply to tagwerk19 from comment #4)
> As a sanity check...
> ... if you search for a file you know you've indexed (with baloosearch -i
> one-of-your-files"), do you get just a single or multiple results?

Okay, a lot of files I tried to find simply have not been indexed yet. But I have found a bunch of files with multiple indexes now.

> baloosearch6 -i logo-brainstorming.svg
> c3390000001c /home/betalars/Development/inkscape/divoc#r2r/logo-brainstorming.svg
> c33900000027 /home/betalars/Development/inkscape/divoc#r2r/logo-brainstorming.svg
> c33900000028 /home/betalars/Development/inkscape/divoc#r2r/logo-brainstorming.svg
> Elapsed: 116.79 msecs
Comment 17 contact 2024-06-15 12:10:41 UTC
> That should not be an issue, assuming they have a Mime type of
> "image/svg+xml". You can see what Baloo has indexed for a file with:
>     balooshow -x "one-of-your-images.svg"      

You are right, this does seem sensible.
Comment 18 contact 2024-06-15 12:14:43 UTC
> You are right, this does seem sensible.

* the output of this command when I look for .svgs does seem sensible.

This however is a small extract of the output I get from .obj files:

968788 968789 96879 968790 968791 968792 968793 968794 968795 968796 968797 968798 968799 9688 96880 968800 968801 968802 968803 968804 968805 968806 968807 968808 968809 96881 968810 968811 968812 968813 968814 968815 968816 968817 968818 968819 96882 968820 968821 968822 968823 968824 968825 968826 968827 968828 968829 96883 968830 968831 968832 968833 968834 968835 968836 968837 968838 968839 96884 968840 968841 968842 968843 968844 968845 968846 968847 968848 968849 96885 968850 968851 968852 968853 968854 968855 968856 968857 968858 968859 96886 968860 968861 968862 968863 968864 968865 968866 968867 968868 968869 96887 968870 968871 968872 968873 968874 968875 968876 968877 968878 968879 96888 968880 968881 968882 968883 968884 968885 968886 968887 968888 968889 96889 968890 968891 968892 968893 968894 968895 968896 968897 968898 968899 9689 96890 968900 968901 968902 968903 968904 968905 968906 968907 968908 968909 96891 968910 968911 968912 968913 968914 968915 968916 968917 968918 968919 96892 968920 968921 968922 968923 968924 968925 968926 968927 968928 968929 96893 968930 968931 968932 968933 968934 968935 968936 968937 968938 968939 96894 968940 968941 968942 968943 968944 968945 968946 968947 968948 968949 96895 968950 968951 968952 968953 968954 968955 968956 968957 968958 968959 96896 968960 968961 968962 968963 968964 968965 968966 968967 968968 968969 96897
Comment 19 contact 2024-06-15 12:41:47 UTC
As we have now isolated one of the underlying issues this has been causing, I have created a new bug for it.
https://bugs.kde.org/show_bug.cgi?id=488533
Comment 20 tagwerk19 2024-06-15 22:56:53 UTC
(In reply to contact from comment #16)
> (In reply to tagwerk19 from comment #4)
> > ... if you search for a file you know you've indexed (with baloosearch -i
> > one-of-your-files"), do you get just a single or multiple results?
> 
> Okay, a lot of files I tried to find simply have not been indexed yet. But I
> have found a bunch of files with multiple indexes now.
In that case, it's probably best to kill the indexing, remove the index ("balooctl purge") and restart...

... But before you do that

(In reply to contact from comment #18)
> This however is a small extract of the output I get from .obj files:
> 
> 968788 968789 96879 968790 968791 968792 968793 968794 968795 968796 968797
> 968798 968799 9688 96880 968800 968801 968802 968803 968804 968805 968806
> 968807 968808 968809 96881 968810 968811 968812 968813 968814 968815 968816 ...
Probably not a lot of point indexing that and it's definitely hammering the index.

Summarising https://bugs.kde.org/show_bug.cgi?id=488533#c1

> ... Baloo can exclude mime types. Edit your ~/.config/baloofilerc and add a line under [General]:
>  
>    exclude mimetypes=model/obj
> 
> or with a command:
> 
>    balooctl config add excludeMimetypes model/obj
>        
> This seems to let baloo index the filename but not index the content ...
If you do this, it should stop Baloo content indexing the .obj files, you can then kill Baloo, purge the index with "balooctl purge" and start again. You should find it indexes faster, because of the extra memory and because you've got rid of all the old duplicates. It might still take a while with the amount of data you have but you can watch progress with "balooctl monitor"

We probably need to exclude "model/obj" files by default

Hope this makes sense and good luck!
Comment 21 contact 2024-06-15 23:55:17 UTC
> We probably need to exclude "model/obj" files by default
> 
> Hope this makes sense and good luck!

Okay. Will close this when the new index is built with no issues.
Comment 22 contact 2024-06-17 11:03:34 UTC
Okay, the index is now done building. I think with 1GB it is still quite considerable for a 300GB drive, but as far as I can tell it did not double-index any files and also reached 100%.

Feel free to close this bug if you think all issues have been resolved. The one that we were able to pinpoint has been moved into it's own thing.

I will also create an enhancement discussion on duscuss.kde.org around this.
Comment 23 tagwerk19 2024-06-17 11:41:38 UTC
(In reply to contact from comment #22)
> Okay, the index is now done building. I think with 1GB it is still quite
> considerable for a 300GB drive, but as far as I can tell it did not
> double-index any files and also reached 100%.
Ohh! That was pretty quick, given what you were dealing with before...

The double indexing is "historical" and the result of the way BTRFS worked. The issue used to catch OpenSUSE specially and more recently Fedora. BTRFS on an encrypted partition may be an extra complexity. I will add Nixos to my mental list though

I think if you are content indexing with a load of files, you said 500,000?, a 1GB index is quite fair.

> Feel free to close this bug if you think all issues have been resolved. The
> one that we were able to pinpoint has been moved into it's own thing.
I think we can close this one and Bug 488446 but leave Bug 488533 open so we can follow the issue with "model/obj"

> I will also create an enhancement discussion on duscuss.kde.org around this.
Thank you for your efforts and engagement in tracking this down.
Comment 24 tagwerk19 2024-06-17 11:44:46 UTC
(In reply to tagwerk19 from comment #23)
> I think we can close this one and Bug 488446 but leave Bug 488533 open ...
Too many bugs :-/

I meant, also close Bug 488439, that you were seeing unexpected battery drain...