I have a situation where I want to index files that are published via an nfs resource published by a truenas appliance. In the log file every day a reindex of resources already indexed is triggered by baloo, this give two drawbacks: 1. the index file size is exploding 2. the reindex is triggered without a "real" need. This is what I' talking about: New reindex triggered, after a file is correctly reindexed: diego@pc-diego:~: balooshow -x "/net/fileserver4/Segreteria/XXXX.pdf" 1879000000fb 251 6265 /net/fileserver4/Segreteria/XXXX.pdf Mtime: 1583398981 2020-03-05T10:03:01 Ctime: 1672915819 2023-01-05T11:50:19 Cached properties: Autore: xxx.yyy Titolo: Spett Documento generato da: Microsoft® Word 2016 Conto delle pagine: 1 Data di creazione: 2020-03-05T09:03:01.000Z Informazioni interne Termini: x x x x pageCount: 1 generator: 2016 microsoft word ® author: xxx yyy title: spett creationDate: 2020-03-05T09:03:01Z After a while, the directory is unmounted by systemd if a ask baloo again: diego@pc-diego:~: balooshow -x "/net/fileserver4/Segreteria/XXXX.pdf" 18790010000c 1048588 6265 /net/fileserver4/Segreteria/XXXX.pdf: nessuna informazione trovata nell'indice that is telling that baloo has completely forget the file but if I issue: baloosearch -i "XXXX.pdf" baloo found correctly: 187900000084 /net/fileserver4/Segreteria/XXXX.pdf 187900100010 /net/fileserver4/Segreteria/XXXX.pdf 18790010000f /net/fileserver4/Segreteria/XXXX.pdf 1879000000fb /net/fileserver4/Segreteria/XXXX.pdf 1879000000eb /net/fileserver4/Segreteria/XXXX.pdf 1879000000d8 /net/fileserver4/Segreteria/XXXX.pdf 1879000000d4 /net/fileserver4/Segreteria/XXXX.pdf 1879000000d3 /net/fileserver4/Segreteria/XXXX.pdf 1879000000c0 /net/fileserver4/Segreteria/XXXX.pdf but, as you see, it found also other "id"
(In reply to Diego Ercolani from comment #0) > ... > 187900000084 /net/fileserver4/Segreteria/XXXX.pdf > 187900100010 /net/fileserver4/Segreteria/XXXX.pdf > 18790010000f /net/fileserver4/Segreteria/XXXX.pdf > 1879000000fb /net/fileserver4/Segreteria/XXXX.pdf > ... That looks like the inode is OK but the device number is changing each reboot/remount. This is something that's been affecting BTRFS mounts, interesting to see that it's catching NFS as well. There is a merge request drafted, specifically for the BTRFS case, that might also deal with this: https://invent.kde.org/frameworks/baloo/-/merge_requests/131 I might wave a flag for a better (but likely too difficult) solution; that servers index content they host locally and clients that mount the such resources forward search queries to the hosts for resolution there. Can see this would need too many bits to be in place and work together.
(In reply to tagwerk19 from comment #1) > That looks like the inode is OK but the device number is changing each > reboot/remount. This is something that's been affecting BTRFS mounts, > interesting to see that it's catching NFS as well. Yes, of the most common filesystems on Linux, Btrfs, NFS, and CIFS (a.k.a. SMB) all use dynamically allocated device numbers, so they are all affected the same way. > There is a merge request drafted, specifically for the BTRFS case, that > might also deal with this: > https://invent.kde.org/frameworks/baloo/-/merge_requests/131 It will probably help, depending on server configuration. The Linux NFS server will by default use the FSID of the underlying filesystem when presenting an export to clients, so the Linux NFS client should expose an unique FSID. However, the server could be running something like XFS as the underlying filesystem (which does not have a stable FSID), or the exported FSID can be overridden in server configuration, possibly making it non-unique across servers. If server A exports a custom fsid=123 and server B exports a different filesystem with the same custom fsid, a client mounting both filesystems will see two unrelated trees with the same FSID.
Git commit c735faf5a6a3ef3d29882552ad0a9264a294e038 by Nate Graham, on behalf of Tomáš Trnka. Committed on 07/09/2023 at 17:52. Pushed by ngraham into branch 'master'. Use the FSID as the device identifier where possible The device number returned by stat() in st_dev is not persistent in many cases. Btrfs subvolumes or partitions on NVMe devices are assigned device numbers dynamically, so the resulting device ID is typically different after every reboot, forcing Baloo to repeatedly reindex all files. Fortunately, filesystems like Btrfs or ext4 return a persistent unique filesystem ID as f_fsid from statvfs(), so we can use that when available. Other filesystems like XFS derive the FSID from the device number of the underlying block device, so switching to the FSID does not change anything. Related: bug 402154 M +21 -1 src/engine/idutils.h https://invent.kde.org/frameworks/baloo/-/commit/c735faf5a6a3ef3d29882552ad0a9264a294e038
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/169
Git commit 7a1c09ed1b7aa9a7a093f8d715b42a4aedd0f7b6 by Nate Graham, on behalf of Tomáš Trnka. Committed on 07/09/2023 at 17:54. Pushed by ngraham into branch 'kf5'. Use the FSID as the device identifier where possible The device number returned by stat() in st_dev is not persistent in many cases. Btrfs subvolumes or partitions on NVMe devices are assigned device numbers dynamically, so the resulting device ID is typically different after every reboot, forcing Baloo to repeatedly reindex all files. Fortunately, filesystems like Btrfs or ext4 return a persistent unique filesystem ID as f_fsid from statvfs(), so we can use that when available. Other filesystems like XFS derive the FSID from the device number of the underlying block device, so switching to the FSID does not change anything. Related: bug 402154 (cherry picked from commit c735faf5a6a3ef3d29882552ad0a9264a294e038) M +21 -1 src/engine/idutils.h https://invent.kde.org/frameworks/baloo/-/commit/7a1c09ed1b7aa9a7a093f8d715b42a4aedd0f7b6
Can this important fix be backported to Frameworks 5.x?