Bug 460460 - Baloo lies about its status when writing to its database
Summary: Baloo lies about its status when writing to its database
Status: CONFIRMED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.99.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-15 08:54 UTC by Adam Fontenot
Modified: 2022-11-30 17:13 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Fontenot 2022-10-15 08:54:11 UTC
SUMMARY
I was encountering a separate issue with Baloo, probably https://bugs.kde.org/show_bug.cgi?id=442453

I tried to check on Baloo's status - both in the System Setting UI and with `balooctl status` - and it claimed to be idle. But it was clearly not idle, see below.

STEPS TO REPRODUCE
1. Disable content indexing - only file indexing is relevant here.
2. Get into a situation where Baloo will choke on some files and start thrashing the disk. In this case I had opened a large file with Audacity and then closed the program, probably resulting in the deletion of some temporary files. (See the linked bug above describing heavy disk write caused by file deletion.)
3. Try to check on Baloo's status.

OBSERVED RESULT
`balooctl` takes a very long time (30 seconds or so) to respond (as does the System Setting UI)... this System Settings freeze probably deserves a separate bug report...

Eventually it responds and claims to be idle:

Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 788,267
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 3.01 GiB

According to htop, however, `baloo_file` was using quite a lot of memory and an increasing amount of CPU.

I remembered to try `strace` on the `baloo_file` process while this was happening, and it was an endless stream of seeking and writing on a single file descriptor. Almost certainly this file was the Baloo database, given the concurrent index bloating.

EXPECTED RESULT
1. Baloo should respond quickly to status requests. The controller application should never (?) be deadlocked.
2. Baloo should describe itself as "writing to database" or something similar when it is performing these heavy database writes. 3. Baloo should allow the user to kill it from the System UI when it is in *any* running state, even if it is not actively indexing.

In the situation in question, Baloo had bloated its database from 0.5 GB to over 3 GB in the space of about 2 minutes, so being able to kill it quickly is critically important in this kind of situation.

SOFTWARE/OS VERSIONS
Operating System: Arch Linux
KDE Plasma Version: 5.26.0
KDE Frameworks Version: 5.99.0
Qt Version: 5.15.6
Kernel Version: 6.0.1-arch1-1 (64-bit)
Graphics Platform: X11
Comment 1 tagwerk19 2022-10-15 14:18:57 UTC
(In reply to Adam Fontenot from comment #0)
> 2... probably resulting in the deletion of some
> temporary files. (See the linked bug above describing heavy disk write
> caused by file deletion.)
> 3. Try to check on Baloo's status...
Maybe Bug 437754.

Deleting entries does seem to be hard work, as far as I can tell baloo deletes entries one by one (rather than working with 40 files at a time as when content indexing). A "balooctl status" opens the database for reading and while this is happening, any updates are done as "appends". The database might easily balloon in size when caught in this situation.

I'm guessing you had been doing content indexing previously and had disabled it - otherwise 0.5 Gbyte seems a little large. I'd need to check to see whether baloo "cleans up" content records in this situation or just leaves them "orphaned".

Maybe Bug 440802

Second thought is that if baloo_file sees a large number of unindexed files, it first collects the metadata for all of them, then writes the all results to disc. This can give a sharp peak in memory usage and, in extreme cases, bog down the system.

Note in this case, the work that baloo_index is doing is invisible to "balooctl status".

Off-chance thought...

Can you see out what temporary files Audacity creates? Might baloo want to index them? (Are they hidden or not? what does kmimetypefinder say about them?). It is an off-chance as if you are not content indexing, this shouldn't affect you.

Finally...

What filesystem are you using? If you are using BTRFS, there'll be a follow-on set of questions....
Comment 2 Adam Fontenot 2022-10-15 21:11:09 UTC
(In reply to tagwerk19 from comment #1)
> (In reply to Adam Fontenot from comment #0)
> > 2... probably resulting in the deletion of some
> > temporary files. (See the linked bug above describing heavy disk write
> > caused by file deletion.)
> > 3. Try to check on Baloo's status...
> Maybe Bug 437754.
So in this bug, the act of checking on Baloo's status is what triggers both memory use and i/o thrashing followed by bloating the database? That's ... insidious. Lines up pretty well with my experience, given that I was sitting at around 0.5 GB used for months before I had this problem.

> I'm guessing you had been doing content indexing previously and had disabled
> it - otherwise 0.5 Gbyte seems a little large. I'd need to check to see
> whether baloo "cleans up" content records in this situation or just leaves
> them "orphaned".
No, I've had content indexing disabled for a while (due to the innumerable bugs I've encountered with it) and previously deleted the database. So that 0.5 GB was whatever it was previously using. Unfortunately no way to check that now, as I was forced to kill the process and delete the database because of this issue.

For reference, my home folder has about 500k items, of which about 180k are supposedly excluded (being in cache or source code directories I have marked as "not indexed" in the settings).

> Second thought is that if baloo_file sees a large number of unindexed files,
> it first collects the metadata for all of them, then writes the all results
> to disc. This can give a sharp peak in memory usage and, in extreme cases,
> bog down the system.
> 
> Note in this case, the work that baloo_index is doing is invisible to
> "balooctl status".
This is part of what I'm advocating here; there should be no work done by Baloo that is invisible to `balooctl` or is unkillable by the user. (There may be work that we strongly prefer the user not interrupt, but ultimately if we don't provide a way for them to do it nicely they're just going to `kill -9` the Baloo processes.)

> Off-chance thought...
> 
> Can you see out what temporary files Audacity creates? Might baloo want to
> index them? (Are they hidden or not? what does kmimetypefinder say about
> them?). It is an off-chance as if you are not content indexing, this
> shouldn't affect you.
Checked on this, the files are actually created in /var/tmp, so not relevant here.

Most likely what happened is that just before opening Audacity I decided to clear up some free space in order to be able to export files. These were mostly a bunch of large source code directories that I had extracted in my downloads folder (which is indexed) while I was debugging something. It's entirely possible that Baloo was still hung on those by the time I closed Audacity.

> What filesystem are you using? If you are using BTRFS, there'll be a
> follow-on set of questions....
I am indeed using BTRFS.
Comment 3 tagwerk19 2022-10-16 07:12:27 UTC
(In reply to Adam Fontenot from comment #2)
> ... the act of checking on Baloo's status is what triggers both
> memory use and i/o thrashing followed by bloating the database? That's ...
> insidious ...
I tend to agree :-)

"baloo status" locks the file while it tries to enumerate the of files it should index; while the file is locked, updates are committed as "appends"; the database balloons in size and memory use goes up with it. It's a vicious circle and particularly bad when baloo's trying to delete loads of records.

> ... This is part of what I'm advocating here; there should be no work done by Baloo
> that is invisible to `balooctl` or is unkillable by the user ...
... and agree here too.

When it starts up "baloo_file" scans through the filesystem looking for unindexed or updated files. At the moment it does that in memory, only committing the results at the end of the process. That means "balooctl" doesn't see the work done when it is looking in the database file. I'd say "baloo_file" ought to update to the database - every 15 seconds seems OK to me - so that "balooctl" can show what's happening

> ... I decided to clear up some free space in order to be able to export files. These were mostly
> a bunch of large source code directories ...
That definitely sounds like the trigger.

> ... content indexing disabled for a while (due to the innumerable bugs
> I've encountered with it) ...
> I am indeed using BTRFS.
If you are interested, have a look at:

    https://bugs.kde.org/show_bug.cgi?id=402154#c12

With BTFRS and multiple subvols, it can be that you see a different "minor device number" after a reboot. Baloo tracks files in its index by "DocID" and which is a concatenation of the device number and inode, if the device number changes on reboot then baloo thinks it has a completely new set of files to index.

Does it make sense to close this as a duplicate of Bug 437754?
Comment 4 Adam Fontenot 2022-10-16 07:53:27 UTC
(In reply to tagwerk19 from comment #3)
> Does it make sense to close this as a duplicate of Bug 437754?
Probably not? I tried to keep this bug report focused on the user-facing issue in Baloo.

I think you're probably right that the cause of my specific issue was Bug 437754. However, there are a million Baloo bugs for i/o thrashing, memory use, hanging on specific files, incorrect indexing, etc, and one thing that unites them is that the user is put in a situation where they will probably want to kill Baloo. The problem that users regularly encounter is that Baloo claims to be idle and can't be killed. (In my case it even hung the System Settings UI for 30 seconds as well.)

Under the assumption that we probably won't have fixed all the issues with Baloo in a year (or five), I think making sure the user is given correct information when it's misbehaving and the ability to stop it is very important.

> When it starts up "baloo_file" scans through the filesystem looking for 
> unindexed or updated files. At the moment it does that in memory, only 
> committing the results at the end of the process. That means "balooctl" doesn't 
> see the work done when it is looking in the database file.
It's hard to say for sure what Baloo was doing in my case, but from the `strace` it appeared to be seeking and writing a file. If that's the case then I assume it wasn't merely updating the list of changed files in memory, so that can't be the explanation for why it claimed to be idle.

> If you are interested, have a look at:
> 
>     https://bugs.kde.org/show_bug.cgi?id=402154#c12
Fortunately I don't mount any subvolumes explicitly, I have separate real partitions (e.g. for root and home) each formatted with BTRFS, and only the default BTRFS subvolume on each file system is mounted in my fstab.
Comment 5 tagwerk19 2022-10-16 08:18:24 UTC
(In reply to Adam Fontenot from comment #4)
> (In reply to tagwerk19 from comment #3)
> > Does it make sense to close this as a duplicate of Bug 437754?
> Probably not? I tried to keep this bug report focused on the user-facing
> issue in Baloo.
That's fine. I'll flag as Confirmed and add a "See Also" cross reference.

> ... there are a million Baloo bugs for i/o thrashing, memory
> use, hanging on specific files, incorrect indexing, etc...
OK :-/

I'd say though that my experience triaging reports is that the number reported has settled down *remarkably* over the last two years. Older versions of baloo that are disappearing as people update their distro's. Maybe also that I/O load is not so visible with SSD's.

Obviously still stuff to do, as you can see in Comment 1 :-)
Comment 6 Adam Fontenot 2022-10-16 21:34:14 UTC
(In reply to tagwerk19 from comment #5)
> (In reply to Adam Fontenot from comment #4)
> > (In reply to tagwerk19 from comment #3)
> > ... there are a million Baloo bugs for i/o thrashing, memory
> > use, hanging on specific files, incorrect indexing, etc...
> OK :-/
> 
> I'd say though that my experience triaging reports is that the number
> reported has settled down *remarkably* over the last two years. Older
> versions of baloo that are disappearing as people update their distro's.
> Maybe also that I/O load is not so visible with SSD's.
Sorry, I didn't mean to be saucy about it. I certainly appreciate the amount of effort that's gone into improving Baloo and understand that it's not easy work. "Millions" was obviously an exaggeration.

It's just that Baloo is the one component of KDE in which I encounter show-stopping bugs, over and over again. Other components are at least working well enough that I can "dogfood" and improve them. Baloo I'm forced to keep disabled; once a year or so I give it a try to see if things are better. Judging by user reports on places like the KDE subreddit, I don't think my experience is too far from the norm: https://old.reddit.com/r/kde/search?q=baloo&restrict_sr=on You'll find reports here that are hair-raising to say the least.

I do think file name search (Baloo with content indexing disabled) has gotten better; I managed to have it enabled the last few months with not too much trouble... before the combination of Bug 460509 and Bug 437754 forced me to disable it again.
Comment 7 tagwerk19 2022-10-17 06:28:00 UTC
(In reply to Adam Fontenot from comment #6)
> ...  Sorry, I didn't mean to be saucy about it ...
Chuckle!

No problem and no offence taken :-)

I do watch the kde subreddit but find it difficult to get a measure of the issues reported there. Some people are prompted to create bugs.kde.org reports and when they do I can *generally* match these up with known issues. I tend not to deal with crash reports but these are now relatively rare... You just have to look at Bug 389848 to see how far we've come 8-)

A lot of the discussion around the bugs sounds like people have been burnt - sometime previously but how long previously is not said. There is an undercurrent of "first thing I do (you should do!) is disable baloo" which I think is not giving it justice.

Not to say there are not still *major* hurdles to jump over, the behaviour with BTRFS and multiple subvols is something that will require major replumbing. It's there somewhere on the horizon and not going to go away: OpenSUSE users see awful behaviour and we could easily find other distributions choosing a similar filesystem setup (Bug 400704 shows some of the pain, https://bugs.kde.org/show_bug.cgi?id=400704#c31 an attempt at separating out contributing issues)

Slow jobs, shifting sands but maybe quieter waters...