SUMMARY baloo stops indexing an hangs. STEPS TO REPRODUCE 1. restart baloo (systemctl --user restart kde-baloo.service) 2. balooctl6 status reports (very fast) that 27 files have still to be examined 3. balooctl6 monitor shows that some files get indexed 4. balooctl6 status takes very long time to answer 5. repeat whole process, yields same result OBSERVED RESULT baloo does not manage to index all remaining files EXPECTED RESULT baloo finishes all files, does not hang SOFTWARE/OS VERSIONS Operating System: Gentoo Linux 2.17 KDE Plasma Version: 6.2.5 KDE Frameworks Version: 6.10.0 Qt Version: 6.8.2 Kernel Version: 6.12.21-gentoo-dist (64-bit) Graphics Platform: X11 Processors: 12 × Intel® Core™ i7-10750H CPU @ 2.60GHz Memory: 46.7 GiB of RAM Graphics Processor: Mesa Intel® UHD Graphics ADDITIONAL INFORMATION The list of files which are reported by balooctl6 monitor does not help: Even if I remove the last files in this list, baloo stops working on some file further to the front of the list. I would like to debug this problem but I do not know how.
(In reply to peer.griebel from comment #0) > 1. restart baloo (systemctl --user restart kde-baloo.service) > 2. balooctl6 status reports (very fast) that 27 files have still to be > examined > 3. balooctl6 monitor shows that some files get indexed > 4. balooctl6 status takes very long time to answer > 5. repeat whole process, yields same result > .... > ADDITIONAL INFORMATION > The list of files which are reported by balooctl6 monitor does not help: > Even if I remove the last files in this list, baloo stops working on some > file further to the front of the list. I've not seen exactly that behaviour but something close. When running Baloo under systemd it is limited to using 512 MB RAM. You can see the memory use with $ systemctl --user status kde-baloo Look for a "Memory:" line. If are indexing a lot of files, Baloo might be fighting to work within the constraints. Initially, you see the impact as Baloo reads and drops, rereads and drops pages from the database - it has to drop pages it has read in order to read others. If Baloo is content indexing and building a large transaction, it will not be able to "drop" those pages, it will have to ask for more memory. When reaching the 512MB limit, there'll be a gradually increasing delay in allocating it. Potentially Baloo will start swapping, which is bad news... If this it the problem, a potential "quick fix" is to allow Baloo more RAM, try 25% rather than a fixed 512MB. $ systemctl --user edit kde-baloo and add the lines to the "override" file (this change is just for your logged on user): [Service] MemoryHigh=25% and restart: $ systemctl --user restart kde-baloo > I would like to debug this problem but I do not know how. If you want to check to see whether Baloo is hitting the limits, you can use iotop and see whether Baloo has started to read and reread pages and whether the CPU use is dropping as a result of the delays in allocating more memory. An alternative is to manually index the remaining files individually with "balooctl6 index ....", that indexing is done in the foreground and without the systemd limits on RAM usage. Maybe this will give error messages for files... Good luck and lets us know how the tests went...
(In reply to tagwerk19 from comment #1) Thank you very much for your detailed suggestions! They did the trick! The initial status was: Memory: 536M (high: 512M available: 0B peak: 536.3M) But even 2G is not enough: Memory: 1.9G (high: 2G available: 32K peak: 2G) So I set it to 4G. I think I should report it to Gentoo's bugzilla.
(In reply to peer.griebel from comment #2) > The initial status was: Memory: 536M (high: 512M available: 0B peak: 536.3M) > But even 2G is not enough: Memory: 1.9G (high: 2G available: 32K peak: 2G) > So I set it to 4G. Baloo will use the memory as cache. It will of course be happier with more but may not need it, what you see with systemctl status on a quiet system is memory Baloo has used and wants to keep around. You might find that 2GB is quite OK... Good that the issue is sorted :-)
This is insane. Presumably Baloo project set that MemoryHigh=512M value, so, if it's known to be a problem, please raise it to a reasonable value. With modern systems in mind you could easily do MemoryHigh=10% and it would still result in considerable increase for almost everyone but the few recluses still running on 4 GiB or less. I don't know the nature of the Baloo cache that's consuming the large amount of RAM but modern systems do have low latency and high bandwidth solid state storage as well as, I think, reasonably good OS level caching, so maybe Baloo can dial back its own caching to avoid the high memory usage in the first place?
(In reply to Niklāvs Koļesņikovs from comment #4) > This is insane. Presumably Baloo project set that MemoryHigh=512M value, so, if > it's known to be a problem, please raise it to a reasonable value. Previously there was no limit and the 512MB was a reaction to the problems that caused. Those problems correlated to trouble with BTRFS, where files were indexed multiple times and therefore the index grew dramatically. The two together caused havoc. > I don't know the nature of the Baloo cache The index is a memory mapped file, so "close to the metal". A record can be just pulled off disk and used so lookups are *fast*, but that means more work when indexing. I don't think Baloo can determine what pages are in memory, how many of them are clean, how many of them dirty. The systemd/cgroups limit applies external pressure on the memory and seems to work pretty well. In a way it is good to have something external to Baloo enforcing limits, it takes any arguments away from Baloo itself. I really agree that the default should be changed. I have been recommending MemoryHigh=25%, which would be no worse on a 2GB Guest VM and give far better headroom elsewhere.
Okay, so the TL;DR is that Baloo should: 1) deal with excessively large indexes, since there's a long history of that happening not due to huge filesystems but rather Baloo bugs 2) set a reasonable MemoryHigh value based on percentage as a backstop for OOM'ing the system 3) be more resource aware and either avoid using too much RAM (or even defaulting to off) when resource constrained or on the contrary going all out with locking its index to memory for best indexing performance, when there's headroom for that. Yeah, but if it the expected behavior is to consume a lot more than 512M, then I'm not sure it does anyone any good to slow Baloo to a crawl, when there's plenty of resources available. Case in point, I have 32GiB of RAM and I honestly and literally *do not care* about 512M here or there, because on a typical day I have about 20G free and Baloo is more than welcome to use a few G, if it needs to. Meanwhile on our circular economy "TV" with 2 GiB RAM and integrated graphics perhaps Baloo should either switch to a memory conserving mode or just straight up default to off, since that system barely plays YouTube without dropping frames and there's no memory, CPU or I/O budget for extras such as Baloo, that I only use on my main system for tagging family photos. Of course, disabling Baloo is the first thing I did with that system but the point of my example is that probably should be the default on resource constrained systems. Regarding that Btrfs issue, it's supposed to be fixed, although I did catch Baloo earlier this month with again an almost 1G index and stuck indexing some random file while consuming __half__ of the RAID array bandwith for a *month* (yikes and mea culpa for ignoring the unusual drive activity for so long). Presumably it was this same issue at heart but I was able to "solve" it by just banning the folders where it got stuck and purging, since I'll never need Baloo to find anything in them. This is outside my area of expertise but perhaps it would make sense to use an established database meant for storing large data sets and is already optimized for good performance on modern hardware? Furthermore, a file being memory mapped does not guarantee it's actually in the memory. The only real guarantee is to lock that index into memory but that requires either CAP_IPC_LOCK or large enough memlock (`ulimit -l`) and Baloo better makes sure that the system has enough resources to spare at all times to avoid a silly OOM situation, because the locked memory will not get paged out.