Bug 425000 - Baloo tag search results contain duplicates
Summary: Baloo tag search results contain duplicates
Status: RESOLVED DUPLICATE of bug 401863
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Tags (show other bugs)
Version: 5.68.0
Platform: Kubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-04 12:19 UTC by Fonkle
Modified: 2022-03-05 14:48 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fonkle 2020-08-04 12:19:57 UTC
When searched for a tag using baloosearch or dolphin, all files have a duplicate entry without a thumbnail. For instance: _DSC0003.MOV without thumbnail, and _DSC0003.MOV with thumbnail. Their path is identical.


STEPS TO REPRODUCE
1. Configure baloo to include wanted folders (balooctl config add includeFolders <folder>)
2. baloosearch tag:Tagname
3. In Dolphin, use baloosearch://?query=tag:Tagname
 
OBSERVED RESULT
The command in 2. returns  double entries (filename) for each file.
The command in 3. returns all entries twice: once without a thumbnail and another with thumbnail.

EXPECTED RESULT
In both cases, indexed files should be returned once each, with according thumbnail.


SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: Kubuntu 20.04 Desktop x64
KDE Plasma Version: 5.18.5
KDE Frameworks Version: 5.68.0
Qt Version: 5.12.8

ADDITIONAL INFORMATION
Comment 1 Stefan Brüns 2020-08-04 23:57:03 UTC
Which filesystem are you using?
Comment 2 Stefan Brüns 2020-08-04 23:57:20 UTC
.
Comment 3 Bug Janitor Service 2020-08-19 04:33:12 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 4 Fonkle 2020-08-20 14:46:32 UTC
I use Ext4 filesystems.

I have noticed that right after purging Baloo's database the index appears as it should for a while (without duplicates) and then after a few hours or days it shows the duplicates without thumbnails again.
Comment 5 Stefan Brüns 2020-08-24 14:17:28 UTC
As you say "filesystem_s_", are these external drives?

Please provide the output of "stat <filename>" for one of the affected files, once with the DB in a pristine state without duplicates, and once after duplicates appear.
Comment 6 Fonkle 2020-08-24 19:15:45 UTC
Yes, the indexed files are on an external RAID10 drive.

Stat WITHOUT duplicates:

stat ./Bronze03.png
  File: ./Bronze03.png
  Size: 115382          Blocks: 232        IO Block: 4096   regular file
Device: 842h/2114d      Inode: 216404781   Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/username)   Gid: ( 1000/username)
Access: 2019-12-09 20:17:28.000000000 +0100
Modify: 2020-01-10 11:13:36.000000000 +0100
Change: 2020-06-19 23:07:34.892218577 +0200
 Birth: -

I will add the stat for this file as soon as the duplicates appear again.
Comment 7 Fonkle 2020-08-24 19:27:35 UTC
Stat WITH duplicates:

stat ./Bronze03.png
  File: ./Bronze03.png
  Size: 115382          Blocks: 232        IO Block: 4096   regular file
Device: 842h/2114d      Inode: 216404781   Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/username)   Gid: ( 1000/username)
Access: 2019-12-09 20:17:28.000000000 +0100
Modify: 2020-01-10 11:13:36.000000000 +0100
Change: 2020-06-19 23:07:34.892218577 +0200
 Birth: -

Not sure what this would have to do with the DB though, since the file should not be modified by baloo I assume?
Comment 8 Fonkle 2020-08-24 20:13:16 UTC
Update:

I have installed KDE Neon User 5.19.4 (based to Ubuntu 20.04) on the same system. Baloo seemed to work fine that way, until I did "balooctl monitor" (from a Yakuake terminal). Instantly, the duplicates started to appear.

After I suspended Baloo ("balooctl suspend") and restarted dolphin, the duplicates disappeared. Resuming Baloo using "balooctl resume" caused the duplicates to reappear in Dolphin.

Even after purging and rebooting the system, an infinite number of duplicates appear as soon as I navigate to a saved query inside Dolphin. CPU load is ~28% for Dolphin and ~1% baloo_file_extractor.
Comment 9 Fonkle 2020-08-25 14:19:34 UTC
Update:

If I add *.MOV and *.NEF to exludeFilters:

  balooctl config add excludeFilters *.MOV
  balooctl config add excludeFilters *.NEF

The problem disappears. It seems to affect .MOV and .NEF files only.

If I then reenable .MOV files (remove it from excludeFilters) it spits out a lot of errors:

  [mov,mp4,m4a,3gp,3g2,mj2 @ 0x5586dcd9a740] st: 0 edit list: 1 Missing key 
  frame while searching for timestamp: 1001
  [mov,mp4,m4a,3gp,3g2,mj2 @ 0x5586dcd9a740] st: 0 edit list 1 Cannot find an 
  index entry before timestamp: 1001.

However the duplicate problem does NOT occcur.

Only as soon as I remove *.NEF from excludeFilters again, the duplicate entries reappear in Dolphin, though not just for .NEF files but for .MOV files as well.

It seems the problem is related to .NEF (Nikon RAW) files. Their MIME type is image/tiff and a JPG preview can be embedded.
Comment 10 Helgi 2020-11-17 18:48:47 UTC
It seems similar bug https://bugs.kde.org/show_bug.cgi?id=419302

I can confirm it
Operating System: openSUSE Leap 15.2
KDE Plasma Version: 5.18.5
KDE Frameworks Version: 5.71.0
Qt Version: 5.12.7
Kernel Version: 5.3.18-lp152.50-default
OS Type: 64-bit
Processors: 4 × Intel® Core™ i3-7100U CPU @ 2.40GHz
Memory: 7,2 ГіБ
Comment 11 tagwerk19 2021-08-16 20:12:27 UTC
(In reply to Fonkle from comment #9)
> Update:
> 
> If I add *.MOV and *.NEF to exludeFilters:
> 
>   balooctl config add excludeFilters *.MOV
>   balooctl config add excludeFilters *.NEF
> 
> The problem disappears. It seems to affect .MOV and .NEF files only.
Interesting...

> It seems the problem is related to .NEF (Nikon RAW) files. Their MIME type
> is image/tiff and a JPG preview can be embedded.
Could be that things have moved on...

Looking up the mimetype of .nef files:
    $ kmimetypefinder5 sample1.nef
gives
    image/x-nikon-nef

I've tried testing baloo indexing with files from
    https://filesamples.com/formats/nef
and
    https://filesamples.com/formats/mov
and the indexing seems to work - as checked with
    balooshow -x ...
and
    baloosearch filename:mov
    baloosearch filename:nef

That said, I certainly remember times when dolphin behaved as you described

I can also point to Bug 431664 which refers to a set of issues fixed in 5.68.0 with the index depend on the baloo index being purged and rebuilt
Comment 12 tagwerk19 2021-08-16 20:23:32 UTC
(In reply to Helgi from comment #10)
> It seems similar bug https://bugs.kde.org/show_bug.cgi?id=419302
> 
> I can confirm it
> Operating System: openSUSE Leap 15.2
For openSUSE Leap, the issue is likely to be that baloo depends on "stable" device and inode numbers and the BTRFS filesystem, with multiple subvols, doesn't give that. See:
    https://bugs.kde.org/show_bug.cgi?id=402154#c12
Comment 13 Nate Graham 2022-03-05 14:48:04 UTC

*** This bug has been marked as a duplicate of bug 401863 ***