Bug 401863 - baloo creates multiple entries for files residing on multi-device btrfs file systems
Summary: baloo creates multiple entries for files residing on multi-device btrfs file ...
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: HI normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
: 413524 419302 425000 429283 438434 461820 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-12-07 15:21 UTC by Marc Joliet
Modified: 2024-03-03 16:08 UTC (History)
19 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Joliet 2018-12-07 15:21:18 UTC
SUMMARY

Baloo creates multiple entries for files that reside on multi-device btrfs file systems.  In my case I keep media files on a btrfs RAID1 file system (consisting of 2 HDDs) which contains several subvolumes that are mounted into subdirectories of my users' $HOME (e.g., ~/Music, ~/Media).  Files that reside in ~/Media show up in baloosearch in duplicate, and sometimes show up as a third entry under ~/Music, even though they don't exist there (I expect had I waited long enough, entries would show up in duplicate per subvolume).

STEPS TO REPRODUCE
1. Have a multi-device btrfs file system (e.g., RAID1) with multiple subvolumes.
2. Mount the subvolumes directly in $HOME.
3. Enable baloo.

OBSERVED RESULT

Baloo will create duplicate entries for files on the multi-device btrfs file system.

EXPECTED RESULT

Baloo creates unique entries for files on the multi-device btrfs file system.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Gentoo Linux (4.19.7-gentoo)
KDE Plasma Version: 5.14.3
KDE Frameworks Version: 5.52.0
Qt Version: 5.11.1

ADDITIONAL INFORMATION

As a concrete example, I had ~/Media/marcec_backup_btrfs_image.img show up twice, and also show up as ~/Music/marcec_backup_btrfs_image.img (no copy-paste because I forgot to save the output before recreating the baloo database).

BTW, while researching this bug I found https://phabricator.kde.org/T9805, which looks to me like a way to fix this problem.
Comment 1 Marc Joliet 2018-12-09 15:25:33 UTC
It turns out that the duplicates are not limited to the multi-device file system, but also happen on my root file system (btrfs on a single SSD).  For example:

% ls -lh Sync/svn_notes.org
-rw-r--r-- 1 marcec users 7,7K 12. Dez 2014  Sync/svn_notes.org
% baloosearch svn_notes.org
/home/marcec/Sync/Notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
Verstrichen: 4.86556 msec

(OT: The first search result confuses me, because it does not mention svn_notes.org anywhere.)

It also shows up three times if I perform the same search in krunner.  In Dolphin, however, it only shows up once.  Does Dolphin perform some result deduplication?  (I performed this test for a few other files, too, with the same result.)
Comment 2 Johannes Tiemer 2019-05-23 23:42:18 UTC
I am experiencing the same behaviour. Baloo returns all results threefold regardless whether I search in the shell or via dolphin and ctrl+f. It does _not_ return triple results via krunner though. I have three partitions mounted to /home or parts of my home directory. Actually four, but the fourth is excluded from the search. The total number of partitions is seven. The file system on all partitions is ext4.

- number of returned finds equals the number of partitions in the search path
- the file system seems not to be involved

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch Linux (5.1.3.arch2-1)
KDE Plasma Version: 5.15.5-1
KDE Frameworks Version: 5.58.0.1
Qt5 Version: 5.12.3-2
Comment 3 Marc Joliet 2019-11-05 11:14:07 UTC
I also see this on an up-to-date openSUSE Tumbleweed (single SSD in a laptop, / and /home on btrfs):

% baloosearch svn_notes
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
/home/marcec/Sync/svn_notes.org
Verstrichen: 0,410997 msec

Unlike my original report, I don't see the multiple results in krunner on this system.  However, I found out that I do get duplicate results in Dolphin if I search by content instead of filename!
Comment 4 Nate Graham 2019-11-09 16:51:14 UTC
*** Bug 413524 has been marked as a duplicate of this bug. ***
Comment 5 Johannes Tiemer 2019-12-28 23:33:30 UTC
(In reply to Marc Joliet from comment #3)
> Unlike my original report, I don't see the multiple results in krunner on
> this system.  However, I found out that I do get duplicate results in
> Dolphin if I search by content instead of filename!

I can confirm the same behavior on Arch. Single result per actual file when searching for a filename, triple or quadruple results when searching for content. Both via Dolphin and CLI.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch Linux (5.3.13.arch1-1)
KDE Plasma Version (plasma-workspace): 5.17.3-1
KDE Frameworks Version: 5.64.0.1
Qt5 Version: 5.13.2-3
baloo: 5.64.0-1
Filesystem: ext4
Comment 6 Robert Riemann 2020-03-31 16:39:32 UTC
Same issue on opensuse Tumbleweed with baloo 5.68.0 and KDE Frameworks 5.68.0.
Comment 7 tagwerk19 2021-04-26 22:53:57 UTC
Maybe have a look at:
    https://bugs.kde.org/show_bug.cgi?id=402154#c12

If you do the test it would be interesting to know if the device number has changed, and whether the balooshow details have also changed...
Comment 8 Nate Graham 2022-03-05 14:47:30 UTC
*** Bug 438434 has been marked as a duplicate of this bug. ***
Comment 9 Nate Graham 2022-03-05 14:48:04 UTC
*** Bug 425000 has been marked as a duplicate of this bug. ***
Comment 10 Nate Graham 2022-03-05 14:50:05 UTC
*** Bug 419302 has been marked as a duplicate of this bug. ***
Comment 11 Nate Graham 2022-03-05 14:50:06 UTC
*** Bug 429283 has been marked as a duplicate of this bug. ***
Comment 12 tagwerk19 2022-11-20 23:02:16 UTC
*** Bug 461820 has been marked as a duplicate of this bug. ***
Comment 13 Robert Riemann 2023-02-22 10:55:52 UTC
In https://bugs.kde.org/show_bug.cgi?id=419302, Martin commented that:

> Neil Brown clearly said that no userspace component can rely on device numbers since kernel 2.4. Luckily he recommended an alternative:
>
> "That is really hard to provide in general.  Possibly the best approach
> is to use the statfs() systemcall to get the "f_fsid" field.  This is
> 64bits.  It is not supported uniformly well by all filesystems, but I
> think it is at least not worse than using the device number.  For a lot
> of older filesystems it is just an encoding of the device number.
> 
> For btrfs, xfs, ext4 it is much much better."
>
> https://lore.kernel.org/linux-block/1769070.0rzTUBzp5V@ananda/T/#m28b8c889c9289ad1ec76cbf040938ea883e3f375

So if this would help, is there already work ongoing on a change from device id to this f_fsid?
Comment 14 tagwerk19 2024-03-03 16:08:32 UTC
Realised that the invent MRs to solve this have not been mentioned here...

For KF6
    https://invent.kde.org/frameworks/baloo/-/merge_requests/131
and cherrypicked for KF5
    https://invent.kde.org/frameworks/baloo/-/merge_requests/169

Worth a note that this can cause duplicated results listed with ext4 systems, Bug 475919