Bug 419302 - Dolphin search/baloo shows a lot of duplicates
Summary: Dolphin search/baloo shows a lot of duplicates
Status: RESOLVED DUPLICATE of bug 401863
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.69.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-27 17:17 UTC by nuc
Modified: 2022-03-05 22:29 UTC (History)
9 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
dolphin search duplicates (50.86 KB, image/png)
2020-03-27 17:17 UTC, nuc
Details

Note You need to log in before you can comment on or make changes to this bug.
Description nuc 2020-03-27 17:17:03 UTC
Created attachment 127049 [details]
dolphin search duplicates

When searching for a file in the dolphin search engine I get a ton of duplicates.

Those the 32.png in the attachement for example is only spread over about 5 folders but it shows 34 results.

Somnething is not right here.
Comment 1 Nate Graham 2020-04-15 04:41:03 UTC
what does running `baloosearch 32.png` in a terminal window show you?
Comment 2 nuc 2020-04-15 13:50:34 UTC
It basically shows 2-3 duplicates for each entry
Comment 3 Nate Graham 2020-04-15 13:59:06 UTC
Moving to Baloo, then. There seems to be an issue with your database.
Comment 4 nuc 2020-04-15 21:32:34 UTC
Steps to reproduce:

1. Enter any directory
2. ´touch baloon´
3. `baloosearch baloon`
4. `rm baloon`
5. `baloosearch baloon`

Repeat an infinite amount of times to create an infinite number of duplicates :)
Comment 5 Helgi 2020-11-17 16:05:46 UTC
I cam confirm this bug.

Operating System: openSUSE Leap 15.2
KDE Plasma Version: 5.18.5
KDE Frameworks Version: 5.71.0
Qt Version: 5.12.7
Kernel Version: 5.3.18-lp152.50-default
OS Type: 64-bit
Processors: 4 × Intel® Core™ i3-7100U CPU @ 2.40GHz
Memory: 7,2 ГіБ
Comment 6 Nate Graham 2020-11-18 21:01:28 UTC
*** Bug 429283 has been marked as a duplicate of this bug. ***
Comment 7 Dave 2021-03-10 20:16:57 UTC
Every single file is duplicated twice in Dolphin search and in baloosearch.

KDE Frameworks 5.78.0
Qt 5.15.2 (built against 5.15.2)
Comment 8 tagwerk19 2021-04-27 12:37:56 UTC
You see this is on openSuse / another distribution with BTRFS and multiple subvolumes?

See Bug 402154, specifically see whether the device number of your home directory change on reboot. If that happens it seems that baloo reindexes your files and shows multiple hits...
Comment 9 Dave 2021-04-27 13:42:13 UTC
It's interesting that while Dolphin and Baloo search show duplicate files, Milou, the Plasma search widget, doesn't. Does it deduplicate the search results?
Comment 10 tagwerk19 2021-04-28 06:48:03 UTC
(In reply to David Palacio from comment #9)
> It's interesting that while Dolphin and Baloo search show duplicate files,
> Milou, the Plasma search widget, doesn't. Does it deduplicate the search
> results?
I can say that I just get the one hit with krunner/search widget, seems likely some sanitising is happening...
Comment 11 Massimiliano L 2021-12-28 10:29:55 UTC
It could be nice to have confirmation by the OP that this problem occurs with btrfs, but it seems highly likely that this is a duplicate of bug 402154. Since the latter is not very findable (misleading title, long discussion), I guess this could stay open.

A few comments / considerations:

- I can confirm the file search in the menu is not affected by duplicated results, so at least there should be a way to fix the appearance in Dolphin even if duplicates are present in the index;

- I am not up to date w.r.t. about how btrfs adoption is evolving in the wild, but with major distros such as OpenSUSE and Fedora on board the userbase is becoming pretty large. A warning message in the File Search config module about support for btrfs being "experimental" could be welcome, but I am not sure what is the KDE policy about this kind of thing;

- is it possible to think of a "sanification" routine for the file index, dedicated to the detection / elimination of duplicate file entries?
Comment 12 tagwerk19 2022-01-05 13:27:48 UTC
(In reply to Massimiliano L from comment #11)
> It could be nice to have confirmation by the OP that this problem occurs
> with btrfs, but it seems highly likely that this is a duplicate of bug
> 402154. Since the latter is not very findable (misleading title, long
> discussion), I guess this could stay open.
It's certainly the case that the openSUSE config, BTRFS with multiple subvols, causes this symptom (and is still the case).

However I know I've also encountered this elsewhere. It's something that happened frequently for me a couple of years back but quite rarely now, that was with Fedora (BTRFS) but also, I suspect, with Neon (ext4).
Comment 13 Nate Graham 2022-03-05 14:50:05 UTC
Assuming BTRFS since everything else fits.

*** This bug has been marked as a duplicate of bug 401863 ***
Comment 14 nuc 2022-03-05 19:54:57 UTC
Actually I am on ext4, and I am pretty sure I was on ext4 back then, too.

However I do not use plasma anymore so cannot comment further on the issue.

However trying my repro steps from https://bugs.kde.org/show_bug.cgi?id=419302#c4 might be an indicator if the bugs are connected :)
Comment 15 Martin Steigerwald 2022-03-05 22:29:10 UTC
In one of those bug reports we already established that this is not just about BTRFS.

This is about kernel device major:minor numbers not guaranteed to be stable in various circumstances.

We discussed all of this before. I even asked kernel developers. See here:

https://bugs.kde.org/show_bug.cgi?id=438434#c14

In there I wrote:

Neil Brown clearly said that no userspace component can rely on device numbers since kernel 2.4. Luckily he recommended an alternative:

"That is really hard to provide in general.  Possibly the best approach
is to use the statfs() systemcall to get the "f_fsid" field.  This is
64bits.  It is not supported uniformly well by all filesystems, but I
think it is at least not worse than using the device number.  For a lot
of older filesystems it is just an encoding of the device number.

For btrfs, xfs, ext4 it is much much better."

https://lore.kernel.org/linux-block/1769070.0rzTUBzp5V@ananda/T/#m28b8c889c9289ad1ec76cbf040938ea883e3f375