Bug 471289 - automount/x-systemd-automount nfs resource changes inode on every mount and so reindexes is triggered
Summary: automount/x-systemd-automount nfs resource changes inode on every mount and s...
Status: REPORTED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Engine (show other bugs)
Version: 5.102.0
Platform: openSUSE Linux
: NOR major
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-21 11:39 UTC by Diego Ercolani
Modified: 2023-09-11 13:24 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Diego Ercolani 2023-06-21 11:39:21 UTC
I have a situation where I want to index files that are published via an nfs resource published by a truenas appliance.
In the log file every day a reindex of resources already indexed is triggered by baloo, this give two drawbacks:

1. the index file size is exploding
2. the reindex is triggered without a "real" need.

This is what I' talking about:
New reindex triggered, after a file is correctly reindexed:

diego@pc-diego:~: balooshow -x "/net/fileserver4/Segreteria/XXXX.pdf"
1879000000fb 251 6265 /net/fileserver4/Segreteria/XXXX.pdf
        Mtime: 1583398981 2020-03-05T10:03:01
        Ctime: 1672915819 2023-01-05T11:50:19
        Cached properties:
                Autore: xxx.yyy
                Titolo: Spett
                Documento generato da: Microsoft® Word 2016
                Conto delle pagine: 1
                Data di creazione: 2020-03-05T09:03:01.000Z

Informazioni interne
Termini: x x x x
pageCount: 1
generator: 2016 microsoft word ®
author: xxx yyy
title: spett
creationDate: 2020-03-05T09:03:01Z

After a while, the directory is unmounted by systemd
if a ask baloo again:
diego@pc-diego:~: balooshow -x "/net/fileserver4/Segreteria/XXXX.pdf"
18790010000c 1048588 6265 /net/fileserver4/Segreteria/XXXX.pdf: nessuna informazione trovata nell'indice

that is telling that baloo has completely forget the file
but if I issue:
baloosearch -i "XXXX.pdf"
baloo found correctly:
187900000084 /net/fileserver4/Segreteria/XXXX.pdf
187900100010 /net/fileserver4/Segreteria/XXXX.pdf
18790010000f /net/fileserver4/Segreteria/XXXX.pdf
1879000000fb /net/fileserver4/Segreteria/XXXX.pdf
1879000000eb /net/fileserver4/Segreteria/XXXX.pdf
1879000000d8 /net/fileserver4/Segreteria/XXXX.pdf
1879000000d4 /net/fileserver4/Segreteria/XXXX.pdf
1879000000d3 /net/fileserver4/Segreteria/XXXX.pdf
1879000000c0 /net/fileserver4/Segreteria/XXXX.pdf

but, as you see, it found also other "id"
Comment 1 tagwerk19 2023-06-21 12:25:15 UTC
(In reply to Diego Ercolani from comment #0)
> ...
> 187900000084 /net/fileserver4/Segreteria/XXXX.pdf
> 187900100010 /net/fileserver4/Segreteria/XXXX.pdf
> 18790010000f /net/fileserver4/Segreteria/XXXX.pdf
> 1879000000fb /net/fileserver4/Segreteria/XXXX.pdf
> ...
That looks like the inode is OK but the device number is changing each reboot/remount. This is something that's been affecting BTRFS mounts, interesting to see that it's catching NFS as well.

There is a merge request drafted, specifically for the BTRFS case, that might also deal with this:
    https://invent.kde.org/frameworks/baloo/-/merge_requests/131

I might wave a flag for a better (but likely too difficult) solution; that servers index content they host locally and clients that mount the such resources forward search queries to the hosts for resolution there. Can see this would need too many bits to be in place and work together.
Comment 2 Tomas Trnka 2023-07-04 08:35:46 UTC
(In reply to tagwerk19 from comment #1)
> That looks like the inode is OK but the device number is changing each
> reboot/remount. This is something that's been affecting BTRFS mounts,
> interesting to see that it's catching NFS as well.

Yes, of the most common filesystems on Linux, Btrfs, NFS, and CIFS (a.k.a. SMB) all use dynamically allocated device numbers, so they are all affected the same way.

> There is a merge request drafted, specifically for the BTRFS case, that
> might also deal with this:
>     https://invent.kde.org/frameworks/baloo/-/merge_requests/131

It will probably help, depending on server configuration. The Linux NFS server will by default use the FSID of the underlying filesystem when presenting an export to clients, so the Linux NFS client should expose an unique FSID. However, the server could be running something like XFS as the underlying filesystem (which does not have a stable FSID), or the exported FSID can be overridden in server configuration, possibly making it non-unique across servers. If server A exports a custom fsid=123 and server B exports a different filesystem with the same custom fsid, a client mounting both filesystems will see two unrelated trees with the same FSID.
Comment 3 Nate Graham 2023-09-07 15:52:51 UTC
Git commit c735faf5a6a3ef3d29882552ad0a9264a294e038 by Nate Graham, on behalf of Tomáš Trnka.
Committed on 07/09/2023 at 17:52.
Pushed by ngraham into branch 'master'.

Use the FSID as the device identifier where possible

The device number returned by stat() in st_dev is not persistent in many
cases. Btrfs subvolumes or partitions on NVMe devices are assigned
device numbers dynamically, so the resulting device ID is typically
different after every reboot, forcing Baloo to repeatedly reindex all
files.

Fortunately, filesystems like Btrfs or ext4 return a persistent
unique filesystem ID as f_fsid from statvfs(), so we can use that when
available. Other filesystems like XFS derive the FSID from the device
number of the underlying block device, so switching to the FSID does not
change anything.
Related: bug 402154

M  +21   -1    src/engine/idutils.h

https://invent.kde.org/frameworks/baloo/-/commit/c735faf5a6a3ef3d29882552ad0a9264a294e038
Comment 4 Bug Janitor Service 2023-09-07 15:54:47 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/169
Comment 5 Nate Graham 2023-09-07 15:56:45 UTC
Git commit 7a1c09ed1b7aa9a7a093f8d715b42a4aedd0f7b6 by Nate Graham, on behalf of Tomáš Trnka.
Committed on 07/09/2023 at 17:54.
Pushed by ngraham into branch 'kf5'.

Use the FSID as the device identifier where possible

The device number returned by stat() in st_dev is not persistent in many
cases. Btrfs subvolumes or partitions on NVMe devices are assigned
device numbers dynamically, so the resulting device ID is typically
different after every reboot, forcing Baloo to repeatedly reindex all
files.

Fortunately, filesystems like Btrfs or ext4 return a persistent
unique filesystem ID as f_fsid from statvfs(), so we can use that when
available. Other filesystems like XFS derive the FSID from the device
number of the underlying block device, so switching to the FSID does not
change anything.
Related: bug 402154


(cherry picked from commit c735faf5a6a3ef3d29882552ad0a9264a294e038)

M  +21   -1    src/engine/idutils.h

https://invent.kde.org/frameworks/baloo/-/commit/7a1c09ed1b7aa9a7a093f8d715b42a4aedd0f7b6
Comment 6 Maximilian Böhm 2023-09-11 13:24:40 UTC
Can this important fix be backported to Frameworks 5.x?