Bug 437019 - Baloo re-indexes content of files moved to another folder
Summary: Baloo re-indexes content of files moved to another folder
Status: RESOLVED NOT A BUG
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.82.0
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-13 09:48 UTC by David Kredba
Modified: 2021-11-12 09:56 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Kredba 2021-05-13 09:48:50 UTC
SUMMARY
Files moved to another folder are fully (content) re-indexed after KDE session re-logon.

STEPS TO REPRODUCE
1. Create a new folder, copy there some tens of bigger files for example, wait for them to be indexed
2. Create another folder and move those files to it
3. (Check .xsession-errors file for baloo messages noting each of files moves)
4. Run balooctl status to see that no files are pending indexing
5. Log off your KDE session
6. Log in to your KDE session
7. Run balooctl status to see that moved files are being cotent re-indexed

In my opinion baloo's index file has the file content data and should only update the path to them instead of re-indexing them.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 5.12.3-gentoo
KDE Plasma Version: 5.21.5
KDE Frameworks Version: 5.82.0
Qt Version: 5.15.2
Comment 1 tagwerk19 2021-05-13 11:33:57 UTC
You are saying you see the count of "total files" in balooctl status go up?

    Can you see the indexing happen if you run "balooctl monitor"?

    You might have to disable baloo (balooctl disable) before you log
    out, log back on, run "balooctl monitor" in a separate window and
    then "balooctl enable" (and maybe a second time to get it going)

A possible thing to check is the number of "inotify" watches:

    sysctl fs.inotify.max_user_watches

    It can be that this is down as low as 8192. If you have a lot of
    folders you might hit that limit and baloo will not see filesystem
    changes. Should not be a problem with a recent kernel however you
    can change this value as per
    https://bugs.kde.org/show_bug.cgi?id=433204#c12

A further things to check (as baloo is somewhat dependent on the underlying filesystem)

    Run "stat oneofyourfiles" before moving the file and compare the
    values to a "stat" after the file's been moved. If the device and
    inode numbers are the same, then baloo ought to be able to match up
    the file (even it it missed the inotify msg)

    If you are running BTRFS (with subvolumes) you might find baloo
    reindexing files after a reboot - not necessarily after a logon

Others may be able to give more definitive answers, these are just things I've picked up as I've gone along...
Comment 2 David Kredba 2021-05-13 11:53:59 UTC
Total files count is constant, I tested moves of existing files. Found it when I needed to ran rsync over ssh in the morning being short on time and saw its slowness so went to check what is slowing the disks down. Found indexing of content of files where there was zero to be indexed the day before session log-off (shut down of the machine).

What grows from zero during log-off/log-on is the count of the files waiting to be indexed }if there was file move - and rename too). And it is exactly the count of moved files, it can be seen because I am using it on top of EXT4 fs on top of LUKS on top of a MD Raid 1 on a pair of rotating hard disks.

Yes, I can see it during "monitor" session.

I set fs.inotify.max_user_watches = 2048000 long time before I started using Baloo.
find . -type f | wc -l returns 503013, so there is a reserve for inotify.

When I tried "balooctl check" after files moved it not started to index and the count of files waiting for indexing stayed at 0. Each time I tested it.

I will test the stat case.

Is noatime fine for baloo please? (maybe it can be that simple)
LABEL=HOME              /home                   ext4            noatime

Thank you.
Comment 3 tagwerk19 2021-05-13 13:02:04 UTC
(In reply to David Kredba from comment #2)
> Total files count is constant, I tested moves of existing files. Found it
> when I needed to ran rsync over ssh in the morning being short on time and
> saw its slowness so went to check what is slowing the disks down. Found
> indexing of content of files where there was zero to be indexed the day
> before session log-off (shut down of the machine).
Well, baloo is only "there" when you have logged on. If you connect to your $HOME with rsync/scp and change things then baloo won't have got any inotify msgs and it will discover changes when you log on and it starts up. In the case that rsync updates modification times, baloo thinks it should reindex.

Whether this has an impact of your 'mv' issue, not sure...

> find . -type f | wc -l returns 503013, so there is a reserve for inotify.
Luckily the inotify watches are there for folders, not individual files.

> When I tried "balooctl check" after files moved it not started to index and
> the count of files waiting for indexing stayed at 0. Each time I tested it.
> 
> I will test the stat case.
Gut feeling is that you are OK. There's another test tool you can use though:
    balooshow -x oneofyourfiles
which looks up the entry for the file in the search index. It knows which file is which based on the device number/inode (as per the "stat" command)

You can have a look at
    https://bugs.kde.org/show_bug.cgi?id=402154#c12

> Is noatime fine for baloo please? (maybe it can be that simple)
> LABEL=HOME              /home                   ext4            noatime
My "Neon Testing" machine has
    /dev/vda1 on / type ext4 (rw,noatime)
so you are not alone :-)
Comment 4 David Kredba 2021-05-13 13:06:40 UTC
The rsync goes to folder not inside my $HOME and under another user, it is started in a KDE session using su -.

There are no changes to files stored at $HOME made outside the KDE session.
Comment 5 tagwerk19 2021-05-17 09:06:37 UTC
Did you get any clues from "stat"? Interested in whether the device number and/or inode changes. ref:
    https://bugs.kde.org/show_bug.cgi?id=402154#c12
Comment 6 David Kredba 2021-05-17 09:15:23 UTC
The inode does not change on file move:

  File: hudba_hd/s.mp3
  Size: 24181479        Blocks: 47232      IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 208700903   Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/   )   Gid: ( 1000/   )
Access: 2021-02-10 21:11:19.402467627 +0100
Modify: 2011-03-21 14:34:04.000000000 +0100
Change: 2021-02-10 21:11:19.555469669 +0100
 Birth: 2021-02-10 21:11:19.402467627 +0100

  File: ./s.mp3
  Size: 24181479        Blocks: 47232      IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 208700903   Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/   )   Gid: ( 1000/   )
Access: 2021-02-10 21:11:19.402467627 +0100
Modify: 2011-03-21 14:34:04.000000000 +0100
Change: 2021-05-17 11:11:01.279266326 +0200
 Birth: 2021-02-10 21:11:19.402467627 +0100
Comment 7 tagwerk19 2021-05-17 18:51:14 UTC
(In reply to David Kredba from comment #6)
> The inode does not change on file move...
And I assume they are the same if you check again after the logout/logon...

If that's the case, I've more of less run out of ideas.

I see your 'inode' is pretty high. Could it be that you've been running KDE/Baloo for a long time? There was an index corruption that was mention to have been fixed in 5.68 (Bug 431664) but that you need to purge and reindex.

It's quite possible to mv the config file and index folder 'out the way' if you try reindexing.
Comment 8 tagwerk19 2021-05-17 18:54:30 UTC
Of course, disable baloo, move the .config/baloofilerc file and .local/share/baloo  directory and then reenable...
Comment 9 David Kredba 2021-05-18 06:27:07 UTC
Yes, it is the same after log off/on.

I am testing baloo for years but each time ending with
balooctl disable
balooctl purge
and creating a new KDE profile to be sure that there is nothing left for a new indexing.

It is getting better and better but never survived yet.

I have 4 TB EXT4 FS created with -b 4096 and not optimized for big files, I think that that could be the source of the high inode count.

Last indexing try I speeded up using a symlink of .local/share/baloo folder to a SSD drive. When done I moved it back.


Can someone reproduce it please?
Moving existing files indexed already to another folder and log off/on KDE session is sufficient for me.

Thank you!
Comment 10 tagwerk19 2021-05-18 08:00:37 UTC
(In reply to David Kredba from comment #9)
> Can someone reproduce it please?
> Moving existing files indexed already to another folder and log off/on KDE
> session is sufficient for me.
I'm afraid I can only offer hints and suggestions from my experience...

If you run balooshow for the files before/after a relogon, they have the same devicenumber/inode as per "stat" (and the correct filename as given in the square brackets)?

At times, I have seen baloo "recognise" a new/changed file but, because of inode reuse (with Ext4), it still has "earlier" data in the index. Bug 435434.

Baloo recognises the mismatch after a relogon (and you see "renaming" reports in the logs). In this case "balooctl monitor" did not show the file being indexed before the relogon. If I meet it again, I'll check the "balooctl status".

You are not using hard/soft links anywhere? Anything that might confuse baloo's 'one to one' inode to filename ideal?

If you purge and reindex, do you see the same behaviour if you reindex just a subset of your disk?

> Last indexing try I speeded up using a symlink of .local/share/baloo folder
> to a SSD drive. When done I moved it back.
It's a good trick :-)

Wishing you good luck...
Comment 11 David Kredba 2021-11-12 09:56:59 UTC
After fresh re-indexing it follows directory move(s) without re-indexing file content.