477068 – DB must be purged whenever the DB scheme/contents change in an incompatible way

Bug 477068 - DB must be purged whenever the DB scheme/contents change in an incompatible way

Summary: DB must be purged whenever the DB scheme/contents change in an incompatible way

Status:	REPORTED

Alias:	None

Product:	frameworks-baloo
Classification:	Frameworks and Libraries
Component:	Baloo File Daemon (show other bugs)
Version:	5.112.0
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	baloo-bugs-null

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-11-16 00:48 UTC by Stefan Brüns
Modified:	2024-07-07 15:56 UTC (History)
CC List:	3 users (show)

See Also:	474973 475919
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Stefan Brüns 2023-11-16 00:48:33 UTC

Up to now code changes were done to no cause any incompatibility issues with existing DB contents, but unfortunate https://invent.kde.org/frameworks/baloo/-/merge_requests/131 introduced such a change.

To be able to deal with such issues, the DB should contain a version to notify about incompatible changes of the DB scheme etc. The DB should be recreated in such a case, and the library should check if the DB is compatible (which could happen with e.g. KF5 and KF6 installed at the same time).

Comment 1 tagwerk19 2023-11-21 12:10:11 UTC

(In reply to Stefan Brüns from comment #0)
> ... unfortunate https://invent.kde.org/frameworks/baloo/-/merge_requests/131 introduced such a change ...
That was my bad, I should have noticed the impact on normal "real world" systems. A problem of intensively watching unstable branches :-(

However the positive was the change did not *force* a structure change on the DB. A change to use a 64 bit FSID would have forced that; one step further and very likely necessary at some point.

The issue was there earlier with BTRFS and multiple subvolumes, affecting predominantly OpenSUSE but latterly Fedora (Bug 402154). I've also found it possible to recreate if you have mounted a different filesystem somewhere under your $HOME.

It would be nice if the "search" could be proof against ambiguous parent folders. It would be a "belt and braces" check, something the code ought not to meet but could deal with in case it does.

There is a test DB showing the issue here:
    https://bugs.kde.org/attachment.cgi?id=163168

Comment 2 Stefan Brüns 2023-11-23 07:12:21 UTC

@ngraham: How about notifying the users about the regression caused by the ID change, to keep the impact as small as possible?

This is currently not fixed, and even if it were,  the next KF5 release (KF5.113) is still some weeks in the future (+ distribution delay). 

More and more people will be hit by this. Most notably, it will affect also users with Ext* filesystems, which as of now had no problems with the filesystem/device ID.

Comment 3 tagwerk19 2023-11-23 09:47:01 UTC

I'm a bit more sanguine here; what might people see?

    Doubling up of search results, Bug 475919, might catch them unawares
    (but that is something that is not "out of the ordinary" where your search
    results contains files with the same name)

    Confusing "empty" results when searching from a particular directory (as
    opposed to "Your Files"). That's the Bug 474973

    They might notice baloo reindexing in the background but the systemd limit
    on RAM usage really helps here (even if one could argue about where that
    limit should be set). Possibly the increase in index size but BTRFS systems
    such as openSUSE have suffered far, far worse.

Of the above, I think the "empty" search results is the most pernicious. That's the one it would be nice to have a fix/workround for in a new release...

    ... A change in the DB structure would worry me :-]

Comment 4 Stefan Brüns 2023-11-23 10:56:21 UTC

The problems are:
- Sudden misbehavior seen by users not affected so for (Ext3/4, probably others)
- Doubling of index size for users on e.g. Ext4


Btw, the systemd limit does not actually help so much (if at all), it are the fixes which went in afterwards which made the index behavior significantly better.

Comment 5 tagwerk19 2023-11-23 16:28:27 UTC

(In reply to Stefan Brüns from comment #4)
> - Doubling of index size for users on e.g. Ext4
I'm not sure whether many people would see a difference, at least those with an SSD. This is where we could discuss systemd limits but that's probably better elsewhere :-)

If I look at openSUSE:
    
    $ baloosearch -i fred
     ...
    9caf4a24446 /home/test/fred 
    9ca00000028 /home/test/fred
    9ca00000029 /home/test/fred
    9ca0000002a /home/test/fred
    9ca0000002b /home/test/fred
    9ca0000002c /home/test/fred
    9ca0000002d /home/test/fred
    9ca0000002e /home/test/fred
    9ca0000002f /home/test/fred
    9ca00000027 /home/test/fred
     ...
    Elapsed: 0.465656 msecs
    
This is an oft-rebooted Tumbleweed with BTRFS and I've got 9 copies (as indexed with different minor device numbers) plus a "final" copy with the folded FSID.

> - Sudden misbehavior seen by users not affected so for (Ext3/4, probably others)
I have been on the lookout for comments about misbehaviour, it's been quieter than I expected. 

There is a "heads up" here:
    https://discuss.kde.org/t/baloo-and-frameworks-5-111/6348/1

Comment 6 Stefan Brüns 2023-11-23 19:52:39 UTC

https://discuss.kde.org/t/baloo-and-frameworks-5-111/6348/1

> Purging and reindexing as above is not necessary but probably sensible…

That's obviously incorrect, it is necessary for BTRFS and Ext to have it working correctly. It is better to err on the safe side here.

And while I appreciate you have mentioned it on Discuss, the reach is fairly small, thats the reason I asked Nate to get the message out.

And the number of bugs filed or other reports  is not a good measure of systems affected - people may simply not want to invest the time (and contrary to the automated kcrash reports it takes significantly more time to do so). 

Also, Debian and Ubuntu are both currently shipping older KF5 version (5.107/5.110) not yet affected.

Comment 7 tagwerk19 2023-11-23 21:31:53 UTC

(In reply to Stefan Brüns from comment #6)
> ... It is better to err on the safe side here ...
As I read it, we are saying the same thing, better to be on the safe side.

Good news is Bug 474973 has been around for a while (and is annoying) but we now have an explanation for it.

Comment 8 Stefan Brüns 2023-11-25 07:02:35 UTC

So, although I had asked Nate *twice* about notifying users about the breaking change, he chose to ignore it.

Comment 9 tagwerk19 2023-11-25 07:47:24 UTC

(In reply to Stefan Brüns from comment #6)
> https://discuss.kde.org/t/baloo-and-frameworks-5-111/6348/1
I've added a heads up:
    https://discuss.kde.org/t/baloo-and-frameworks-5-111/6348/2
describing the issue with searching "From Here" as opposed to "Your Files"

Comment 10 tagwerk19 2023-11-28 09:11:05 UTC

What is somewhat strange behaviour with the test DB (as per attachment) is that

    balooctl6 check

doesn't notice/heal the missing parent (and I don't seem to be able to force it with combinations of balooctl6 clear and balooctl6 index)

Maybe when baloo_file does its initial scan through the filestructure to set up iNotify watches, it could do a sanity check of the folder name against the DocId.

(It's possible to "balooctl6 clear" a just-indexed entry, but seems not possible to clear one indexed earlier where the DocId no longer matches).

Comment 11 tagwerk19 2023-12-02 14:56:00 UTC

(In reply to tagwerk19 from comment #10)
> Maybe when baloo_file does its initial scan through the filestructure to set
> up iNotify watches, it could do a sanity check of the folder name against
> the DocId.
OK, I think that would catch this particular issue.
        
It seems that the FSID change triggers the issue of (comment #3):
> Confusing "empty" results when searching from a particular directory (as
> opposed to "Your Files"). That's the Bug 474973       
but there's no guarantee that this is the only cause. Doing a sanity check of ensuring the folder is indexed before creating an iNotify watch would catch any similar issues. Whatever the root cause.

Comment 12 Stefan Brüns 2023-12-02 16:49:56 UTC

(In reply to tagwerk19 from comment #11)
> (In reply to tagwerk19 from comment #10)
> > Maybe when baloo_file does its initial scan through the filestructure to set
> > up iNotify watches, it could do a sanity check of the folder name against
> > the DocId.
> OK, I think that would catch this particular issue.

I mentioned already in the FSID change that there is no guarantee a document in the DB is currently reachable by its path, see network mounts, see thumbdrives, etc. But unfortunately everyone ignored my reservations, as everyone else apparently knows better how Baloo works than me ...
         
> It seems that the FSID change triggers the issue of (comment #3):
> > Confusing "empty" results when searching from a particular directory (as
> > opposed to "Your Files"). That's the Bug 474973       
> but there's no guarantee that this is the only cause. Doing a sanity check
> of ensuring the folder is indexed before creating an iNotify watch would
> catch any similar issues. Whatever the root cause.

You fall into the same trap as others have before, these are not synchronous. First, inotify works with paths, and second, it works with a sequence of events. You will have intermediate states which may no longer be valid, e.g. when doing renames. Trying to "synchronize" inotify and DB state have been the source of inconsistent DB states in the past, this will only make things worse. The DB must be self-consistent, but may temporarily be in a different state than the FS.

But actually, these comments are off-topic here. This BR is about DB inconsistencies caused by code changes, nothing else.

Comment 13 tagwerk19 2023-12-03 19:29:02 UTC

(In reply to Stefan Brüns from comment #12)
> ... everyone else apparently knows better how Baloo works than me ...
I think when troubleshooting, you get a view of problems from the "outside, in" and that comes with a fair amount of guesswork and assumptions. If you know the code, you see things from the "inside, out".

That's two different ways of "knowing" - it's a question of getting them to meet in the middle.

> ... no guarantee a document in the DB is currently reachable by its path, see network mounts, see thumbdrives, etc ...
That's quite clear.

In this instance though I create a test file, that triggers an iNotify, and baloo indexes it - without the parent directory being in the index (with the right DocID anyway).

> ... First, inotify works with paths, and second, it works with a sequence of events ...
Yes. I've seen, with inotifywait, some of the holes you can fall into (for example when a move is done with a "link/unlink") and can see that you could get synchronisation issues. 

I am pretty sure that is not happening here though. For what it is worth, I did the tests with and without content indexing enabled.

> ... This BR is about DB inconsistencies caused by code changes, nothing else.
The thread has drifted, apologies for that. I think I posted to the wrong thread to start with.