SUMMARY I was encountering a separate issue with Baloo, probably https://bugs.kde.org/show_bug.cgi?id=442453 I tried to check on Baloo's status - both in the System Setting UI and with `balooctl status` - and it claimed to be idle. But it was clearly not idle, see below. STEPS TO REPRODUCE 1. Disable content indexing - only file indexing is relevant here. 2. Get into a situation where Baloo will choke on some files and start thrashing the disk. In this case I had opened a large file with Audacity and then closed the program, probably resulting in the deletion of some temporary files. (See the linked bug above describing heavy disk write caused by file deletion.) 3. Try to check on Baloo's status. OBSERVED RESULT `balooctl` takes a very long time (30 seconds or so) to respond (as does the System Setting UI)... this System Settings freeze probably deserves a separate bug report... Eventually it responds and claims to be idle: Baloo File Indexer is running Indexer state: Idle Total files indexed: 788,267 Files waiting for content indexing: 0 Files failed to index: 0 Current size of index is 3.01 GiB According to htop, however, `baloo_file` was using quite a lot of memory and an increasing amount of CPU. I remembered to try `strace` on the `baloo_file` process while this was happening, and it was an endless stream of seeking and writing on a single file descriptor. Almost certainly this file was the Baloo database, given the concurrent index bloating. EXPECTED RESULT 1. Baloo should respond quickly to status requests. The controller application should never (?) be deadlocked. 2. Baloo should describe itself as "writing to database" or something similar when it is performing these heavy database writes. 3. Baloo should allow the user to kill it from the System UI when it is in *any* running state, even if it is not actively indexing. In the situation in question, Baloo had bloated its database from 0.5 GB to over 3 GB in the space of about 2 minutes, so being able to kill it quickly is critically important in this kind of situation. SOFTWARE/OS VERSIONS Operating System: Arch Linux KDE Plasma Version: 5.26.0 KDE Frameworks Version: 5.99.0 Qt Version: 5.15.6 Kernel Version: 6.0.1-arch1-1 (64-bit) Graphics Platform: X11
(In reply to Adam Fontenot from comment #0) > 2... probably resulting in the deletion of some > temporary files. (See the linked bug above describing heavy disk write > caused by file deletion.) > 3. Try to check on Baloo's status... Maybe Bug 437754. Deleting entries does seem to be hard work, as far as I can tell baloo deletes entries one by one (rather than working with 40 files at a time as when content indexing). A "balooctl status" opens the database for reading and while this is happening, any updates are done as "appends". The database might easily balloon in size when caught in this situation. I'm guessing you had been doing content indexing previously and had disabled it - otherwise 0.5 Gbyte seems a little large. I'd need to check to see whether baloo "cleans up" content records in this situation or just leaves them "orphaned". Maybe Bug 440802 Second thought is that if baloo_file sees a large number of unindexed files, it first collects the metadata for all of them, then writes the all results to disc. This can give a sharp peak in memory usage and, in extreme cases, bog down the system. Note in this case, the work that baloo_index is doing is invisible to "balooctl status". Off-chance thought... Can you see out what temporary files Audacity creates? Might baloo want to index them? (Are they hidden or not? what does kmimetypefinder say about them?). It is an off-chance as if you are not content indexing, this shouldn't affect you. Finally... What filesystem are you using? If you are using BTRFS, there'll be a follow-on set of questions....
(In reply to tagwerk19 from comment #1) > (In reply to Adam Fontenot from comment #0) > > 2... probably resulting in the deletion of some > > temporary files. (See the linked bug above describing heavy disk write > > caused by file deletion.) > > 3. Try to check on Baloo's status... > Maybe Bug 437754. So in this bug, the act of checking on Baloo's status is what triggers both memory use and i/o thrashing followed by bloating the database? That's ... insidious. Lines up pretty well with my experience, given that I was sitting at around 0.5 GB used for months before I had this problem. > I'm guessing you had been doing content indexing previously and had disabled > it - otherwise 0.5 Gbyte seems a little large. I'd need to check to see > whether baloo "cleans up" content records in this situation or just leaves > them "orphaned". No, I've had content indexing disabled for a while (due to the innumerable bugs I've encountered with it) and previously deleted the database. So that 0.5 GB was whatever it was previously using. Unfortunately no way to check that now, as I was forced to kill the process and delete the database because of this issue. For reference, my home folder has about 500k items, of which about 180k are supposedly excluded (being in cache or source code directories I have marked as "not indexed" in the settings). > Second thought is that if baloo_file sees a large number of unindexed files, > it first collects the metadata for all of them, then writes the all results > to disc. This can give a sharp peak in memory usage and, in extreme cases, > bog down the system. > > Note in this case, the work that baloo_index is doing is invisible to > "balooctl status". This is part of what I'm advocating here; there should be no work done by Baloo that is invisible to `balooctl` or is unkillable by the user. (There may be work that we strongly prefer the user not interrupt, but ultimately if we don't provide a way for them to do it nicely they're just going to `kill -9` the Baloo processes.) > Off-chance thought... > > Can you see out what temporary files Audacity creates? Might baloo want to > index them? (Are they hidden or not? what does kmimetypefinder say about > them?). It is an off-chance as if you are not content indexing, this > shouldn't affect you. Checked on this, the files are actually created in /var/tmp, so not relevant here. Most likely what happened is that just before opening Audacity I decided to clear up some free space in order to be able to export files. These were mostly a bunch of large source code directories that I had extracted in my downloads folder (which is indexed) while I was debugging something. It's entirely possible that Baloo was still hung on those by the time I closed Audacity. > What filesystem are you using? If you are using BTRFS, there'll be a > follow-on set of questions.... I am indeed using BTRFS.
(In reply to Adam Fontenot from comment #2) > ... the act of checking on Baloo's status is what triggers both > memory use and i/o thrashing followed by bloating the database? That's ... > insidious ... I tend to agree :-) "baloo status" locks the file while it tries to enumerate the of files it should index; while the file is locked, updates are committed as "appends"; the database balloons in size and memory use goes up with it. It's a vicious circle and particularly bad when baloo's trying to delete loads of records. > ... This is part of what I'm advocating here; there should be no work done by Baloo > that is invisible to `balooctl` or is unkillable by the user ... ... and agree here too. When it starts up "baloo_file" scans through the filesystem looking for unindexed or updated files. At the moment it does that in memory, only committing the results at the end of the process. That means "balooctl" doesn't see the work done when it is looking in the database file. I'd say "baloo_file" ought to update to the database - every 15 seconds seems OK to me - so that "balooctl" can show what's happening > ... I decided to clear up some free space in order to be able to export files. These were mostly > a bunch of large source code directories ... That definitely sounds like the trigger. > ... content indexing disabled for a while (due to the innumerable bugs > I've encountered with it) ... > I am indeed using BTRFS. If you are interested, have a look at: https://bugs.kde.org/show_bug.cgi?id=402154#c12 With BTFRS and multiple subvols, it can be that you see a different "minor device number" after a reboot. Baloo tracks files in its index by "DocID" and which is a concatenation of the device number and inode, if the device number changes on reboot then baloo thinks it has a completely new set of files to index. Does it make sense to close this as a duplicate of Bug 437754?
(In reply to tagwerk19 from comment #3) > Does it make sense to close this as a duplicate of Bug 437754? Probably not? I tried to keep this bug report focused on the user-facing issue in Baloo. I think you're probably right that the cause of my specific issue was Bug 437754. However, there are a million Baloo bugs for i/o thrashing, memory use, hanging on specific files, incorrect indexing, etc, and one thing that unites them is that the user is put in a situation where they will probably want to kill Baloo. The problem that users regularly encounter is that Baloo claims to be idle and can't be killed. (In my case it even hung the System Settings UI for 30 seconds as well.) Under the assumption that we probably won't have fixed all the issues with Baloo in a year (or five), I think making sure the user is given correct information when it's misbehaving and the ability to stop it is very important. > When it starts up "baloo_file" scans through the filesystem looking for > unindexed or updated files. At the moment it does that in memory, only > committing the results at the end of the process. That means "balooctl" doesn't > see the work done when it is looking in the database file. It's hard to say for sure what Baloo was doing in my case, but from the `strace` it appeared to be seeking and writing a file. If that's the case then I assume it wasn't merely updating the list of changed files in memory, so that can't be the explanation for why it claimed to be idle. > If you are interested, have a look at: > > https://bugs.kde.org/show_bug.cgi?id=402154#c12 Fortunately I don't mount any subvolumes explicitly, I have separate real partitions (e.g. for root and home) each formatted with BTRFS, and only the default BTRFS subvolume on each file system is mounted in my fstab.
(In reply to Adam Fontenot from comment #4) > (In reply to tagwerk19 from comment #3) > > Does it make sense to close this as a duplicate of Bug 437754? > Probably not? I tried to keep this bug report focused on the user-facing > issue in Baloo. That's fine. I'll flag as Confirmed and add a "See Also" cross reference. > ... there are a million Baloo bugs for i/o thrashing, memory > use, hanging on specific files, incorrect indexing, etc... OK :-/ I'd say though that my experience triaging reports is that the number reported has settled down *remarkably* over the last two years. Older versions of baloo that are disappearing as people update their distro's. Maybe also that I/O load is not so visible with SSD's. Obviously still stuff to do, as you can see in Comment 1 :-)
(In reply to tagwerk19 from comment #5) > (In reply to Adam Fontenot from comment #4) > > (In reply to tagwerk19 from comment #3) > > ... there are a million Baloo bugs for i/o thrashing, memory > > use, hanging on specific files, incorrect indexing, etc... > OK :-/ > > I'd say though that my experience triaging reports is that the number > reported has settled down *remarkably* over the last two years. Older > versions of baloo that are disappearing as people update their distro's. > Maybe also that I/O load is not so visible with SSD's. Sorry, I didn't mean to be saucy about it. I certainly appreciate the amount of effort that's gone into improving Baloo and understand that it's not easy work. "Millions" was obviously an exaggeration. It's just that Baloo is the one component of KDE in which I encounter show-stopping bugs, over and over again. Other components are at least working well enough that I can "dogfood" and improve them. Baloo I'm forced to keep disabled; once a year or so I give it a try to see if things are better. Judging by user reports on places like the KDE subreddit, I don't think my experience is too far from the norm: https://old.reddit.com/r/kde/search?q=baloo&restrict_sr=on You'll find reports here that are hair-raising to say the least. I do think file name search (Baloo with content indexing disabled) has gotten better; I managed to have it enabled the last few months with not too much trouble... before the combination of Bug 460509 and Bug 437754 forced me to disable it again.
(In reply to Adam Fontenot from comment #6) > ... Sorry, I didn't mean to be saucy about it ... Chuckle! No problem and no offence taken :-) I do watch the kde subreddit but find it difficult to get a measure of the issues reported there. Some people are prompted to create bugs.kde.org reports and when they do I can *generally* match these up with known issues. I tend not to deal with crash reports but these are now relatively rare... You just have to look at Bug 389848 to see how far we've come 8-) A lot of the discussion around the bugs sounds like people have been burnt - sometime previously but how long previously is not said. There is an undercurrent of "first thing I do (you should do!) is disable baloo" which I think is not giving it justice. Not to say there are not still *major* hurdles to jump over, the behaviour with BTRFS and multiple subvols is something that will require major replumbing. It's there somewhere on the horizon and not going to go away: OpenSUSE users see awful behaviour and we could easily find other distributions choosing a similar filesystem setup (Bug 400704 shows some of the pain, https://bugs.kde.org/show_bug.cgi?id=400704#c31 an attempt at separating out contributing issues) Slow jobs, shifting sands but maybe quieter waters...
(In reply to tagwerk19 from comment #7) > Slow jobs, shifting sands but maybe quieter waters... I think a lot of dust has been kicked up and settled again in the last year with: https://invent.kde.org/frameworks/baloo/-/merge_requests/131 for KF6 and cherrypicked for KF5 https://invent.kde.org/frameworks/baloo/-/merge_requests/169 Together with: https://invent.kde.org/frameworks/baloo/-/merge_requests/121 and https://invent.kde.org/frameworks/baloo/-/merge_requests/148 The combination of the BTRFS fix and a "batching up" the baloo_file indexing (like the baloo_file_extractor indexing was batched up) has made a lot of difference elsewhere, particularly after a reboot. The limits on memory use may make the indexing slower but they should mean that indexing is less noticable (they do depend on the system running systemd). You can stop baloo with: $ systemctl --user stop kde-baloo restart it with $ systemctl --user start kde-baloo See its status (and memory usage) with $ systemctl --user status kde-baloo and set up/edit an override to change the memory limits with $ systemctl --user edit kde-baloo The default limit is "MemoryHigh=512M", I tend to set a percentage such as "MemoryHigh=25%" I'll leave this open for now, but will set a "WaitingForInfo" at some point. No rush...
(In reply to tagwerk19 from comment #8) > (In reply to tagwerk19 from comment #7) > > Slow jobs, shifting sands but maybe quieter waters... > I think a lot of dust has been kicked up and settled again in the last year > with: > https://invent.kde.org/frameworks/baloo/-/merge_requests/131 > for KF6 and cherrypicked for KF5 > https://invent.kde.org/frameworks/baloo/-/merge_requests/169 > > Together with: > https://invent.kde.org/frameworks/baloo/-/merge_requests/121 > and > https://invent.kde.org/frameworks/baloo/-/merge_requests/148 > > I'll leave this open for now, but will set a "WaitingForInfo" at some point. > No rush... I'm inclined to agree that the resource constraining fixes along with my fix https://invent.kde.org/frameworks/baloo/-/merge_requests/96 should have a great impact on Baloo's performance. I'm reenabling content indexing and will submit any issues I encounter. I wonder, though, whether the specific issue reported here has been fixed? That is, *if* baloo_file has hung on some file, does `balooctl` still hang, and / or misreport its status as "idle"? Granted, these hangs should be much less common now, but if one does occur, it would be nice to know the situation is correctly reported to the user and can be stopped by pausing / disabling the indexer from the UI. There's also the related bug 437754 where the act of checking on Baloo's status *causes* high memory use and database bloating. If the UI issues discussed here are mostly downstream of the performance problem, this issue could be closed as a duplicate of that one.
(In reply to Adam Fontenot from comment #9) > I wonder, though, whether the specific issue reported here has been fixed? > That is, *if* baloo_file has hung on some file, does `balooctl` still hang, > and / or misreport its status as "idle"? Granted, these hangs should be much > less common now, but if one does occur, it would be nice to know the > situation is correctly reported to the user and can be stopped by pausing / > disabling the indexer from the UI. There's been this... https://invent.kde.org/frameworks/baloo/-/merge_requests/174 I'm not sure whether it's talking about exactly your problem but it seems to catch one of the edge cases. My experience is that the limit on RAM (and stopping Baloo swapping) is a game changer... People were focused on CPU use whereas really a lot of the performance hit was memory related.