SUMMARY Baloo creates multiple entries for files that reside on multi-device btrfs file systems. In my case I keep media files on a btrfs RAID1 file system (consisting of 2 HDDs) which contains several subvolumes that are mounted into subdirectories of my users' $HOME (e.g., ~/Music, ~/Media). Files that reside in ~/Media show up in baloosearch in duplicate, and sometimes show up as a third entry under ~/Music, even though they don't exist there (I expect had I waited long enough, entries would show up in duplicate per subvolume). STEPS TO REPRODUCE 1. Have a multi-device btrfs file system (e.g., RAID1) with multiple subvolumes. 2. Mount the subvolumes directly in $HOME. 3. Enable baloo. OBSERVED RESULT Baloo will create duplicate entries for files on the multi-device btrfs file system. EXPECTED RESULT Baloo creates unique entries for files on the multi-device btrfs file system. SOFTWARE/OS VERSIONS Linux/KDE Plasma: Gentoo Linux (4.19.7-gentoo) KDE Plasma Version: 5.14.3 KDE Frameworks Version: 5.52.0 Qt Version: 5.11.1 ADDITIONAL INFORMATION As a concrete example, I had ~/Media/marcec_backup_btrfs_image.img show up twice, and also show up as ~/Music/marcec_backup_btrfs_image.img (no copy-paste because I forgot to save the output before recreating the baloo database). BTW, while researching this bug I found https://phabricator.kde.org/T9805, which looks to me like a way to fix this problem.
It turns out that the duplicates are not limited to the multi-device file system, but also happen on my root file system (btrfs on a single SSD). For example: % ls -lh Sync/svn_notes.org -rw-r--r-- 1 marcec users 7,7K 12. Dez 2014 Sync/svn_notes.org % baloosearch svn_notes.org /home/marcec/Sync/Notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org Verstrichen: 4.86556 msec (OT: The first search result confuses me, because it does not mention svn_notes.org anywhere.) It also shows up three times if I perform the same search in krunner. In Dolphin, however, it only shows up once. Does Dolphin perform some result deduplication? (I performed this test for a few other files, too, with the same result.)
I am experiencing the same behaviour. Baloo returns all results threefold regardless whether I search in the shell or via dolphin and ctrl+f. It does _not_ return triple results via krunner though. I have three partitions mounted to /home or parts of my home directory. Actually four, but the fourth is excluded from the search. The total number of partitions is seven. The file system on all partitions is ext4. - number of returned finds equals the number of partitions in the search path - the file system seems not to be involved SOFTWARE/OS VERSIONS Linux/KDE Plasma: Arch Linux (5.1.3.arch2-1) KDE Plasma Version: 5.15.5-1 KDE Frameworks Version: 5.58.0.1 Qt5 Version: 5.12.3-2
I also see this on an up-to-date openSUSE Tumbleweed (single SSD in a laptop, / and /home on btrfs): % baloosearch svn_notes /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org /home/marcec/Sync/svn_notes.org Verstrichen: 0,410997 msec Unlike my original report, I don't see the multiple results in krunner on this system. However, I found out that I do get duplicate results in Dolphin if I search by content instead of filename!
*** Bug 413524 has been marked as a duplicate of this bug. ***
(In reply to Marc Joliet from comment #3) > Unlike my original report, I don't see the multiple results in krunner on > this system. However, I found out that I do get duplicate results in > Dolphin if I search by content instead of filename! I can confirm the same behavior on Arch. Single result per actual file when searching for a filename, triple or quadruple results when searching for content. Both via Dolphin and CLI. SOFTWARE/OS VERSIONS Linux/KDE Plasma: Arch Linux (5.3.13.arch1-1) KDE Plasma Version (plasma-workspace): 5.17.3-1 KDE Frameworks Version: 5.64.0.1 Qt5 Version: 5.13.2-3 baloo: 5.64.0-1 Filesystem: ext4
Same issue on opensuse Tumbleweed with baloo 5.68.0 and KDE Frameworks 5.68.0.
Maybe have a look at: https://bugs.kde.org/show_bug.cgi?id=402154#c12 If you do the test it would be interesting to know if the device number has changed, and whether the balooshow details have also changed...
*** Bug 438434 has been marked as a duplicate of this bug. ***
*** Bug 425000 has been marked as a duplicate of this bug. ***
*** Bug 419302 has been marked as a duplicate of this bug. ***
*** Bug 429283 has been marked as a duplicate of this bug. ***
*** Bug 461820 has been marked as a duplicate of this bug. ***
In https://bugs.kde.org/show_bug.cgi?id=419302, Martin commented that: > Neil Brown clearly said that no userspace component can rely on device numbers since kernel 2.4. Luckily he recommended an alternative: > > "That is really hard to provide in general. Possibly the best approach > is to use the statfs() systemcall to get the "f_fsid" field. This is > 64bits. It is not supported uniformly well by all filesystems, but I > think it is at least not worse than using the device number. For a lot > of older filesystems it is just an encoding of the device number. > > For btrfs, xfs, ext4 it is much much better." > > https://lore.kernel.org/linux-block/1769070.0rzTUBzp5V@ananda/T/#m28b8c889c9289ad1ec76cbf040938ea883e3f375 So if this would help, is there already work ongoing on a change from device id to this f_fsid?
Realised that the invent MRs to solve this have not been mentioned here... For KF6 https://invent.kde.org/frameworks/baloo/-/merge_requests/131 and cherrypicked for KF5 https://invent.kde.org/frameworks/baloo/-/merge_requests/169 Worth a note that this can cause duplicated results listed with ext4 systems, Bug 475919