Bug 473449 - Duplicate search restriction is sometimes ignored
Summary: Duplicate search restriction is sometimes ignored
Status: REPORTED
Alias: None
Product: digikam
Classification: Applications
Component: Searches-Similarity (show other bugs)
Version: 8.1.0
Platform: Microsoft Windows Microsoft Windows
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-16 12:49 UTC by Michael
Modified: 2025-04-11 18:13 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Faulty result (238.87 KB, image/png)
2023-08-16 12:49 UTC, Michael
Details
Config for finding duplicates (22.10 KB, image/png)
2023-11-17 17:33 UTC, Michael
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael 2023-08-16 12:49:04 UTC
Created attachment 161004 [details]
Faulty result

SUMMARY
***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug symbols.
See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***
Searching with restrictions does not always seem to work properly and returns unexpected results.

I tested with 2 albums ( A and B ) and used 3 different images ( 1, 2 and 3 ). The albums are structured as follows:

Root
  + A ( 1, 1, 3 )
  + B ( 1, 1, 2, 2, 3 )


STEPS TO REPRODUCE
1. Search in both albums for duplicates with the restriction "Exclude Reference Album".
2. Now add the configuration that the reference should be in a different album than A

OBSERVED RESULT
Now duplicates are found in both albums. But in A only 1 of 2 and in B the reference and the other duplicate.

EXPECTED RESULT
I expected that both duplicates are found in A and in B only the reference is present.

See also screenshot

SOFTWARE/OS VERSIONS
Windows: WIN 11
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Perhaps the search is also easier to handle and implement if the search is redesigned in such a way that one can directly specify where to search for the references and where to delete.

This would mean that the reference set R and the set of duplicates D to be deleted or found could be defined as follows:

R=D ( General search for duplicates in the whole database ).
R≠D ( E.g. matching of an import with a certain other image set )
S∩D or S∪D ( Restriction to certain sets )
Comment 1 Maik Qualmann 2023-08-16 18:24:35 UTC
I think you misunderstand the album restriction. I also had to look a little at the source code. It more or less does not refer to the reference albums chosen above. The Album Restriction applies to the album of the found reference image.
We can argue about whether all these functions make sense. The selection of the reference album or the determination of the reference image is new and good.
I find the restriction of the album selector unnecessary (Only Tags, current Tab...). It just caused currently problems for a user on the mailing list.

Maik
Comment 2 Michael 2023-08-17 07:11:09 UTC
Then I guess I'm the next one to have problems with this. I usually move large amounts of images. For a gut feeling, I have attached my database statistics below. There are already duplicates in the database that I have to keep with different metadata. On the other hand, I already have a few thousand images to import again and I have to find the images in those imports that are already in the database.

So my requirement is to find all the images in the import that are already in the database. But I only want to delete duplicates from the import. So practically the whole existing database is the reference. Sometimes I also get a request to preserve a duplicate from the import and add it to the current database with new metadata.

So it would help me if I can define what the albums of the reference images are and which albums are the images to be searched. Whether an album is then just a reference, a search or both is then completely up to me. The rest regarding quality and information content of an image, I could cover so far with the advanced search. So in principle, I don't need this function to automatically select the reference, because I usually do that in advance via other functions and filter out the required qualities.

So now my question regarding the duplicate search, do you have any advice for me on how to achieve my goal if this is not a bug?

How shall we proceed with this ticket?

Thanks and many greetings

Michael

digikam version 8.1.0
Images: 
AVIF: 1
BMP: 6695
EPS: 6
GIF: 23135
ICNS: 6
ICO: 94
JP2: 2
JPEG: 141
JPG: 1339786
KRA: 2
PCX: 956
PNG: 34809
PPM: 26
PSD: 17
RAW-ARW: 8
RAW-CR2: 30737
RAW-CRW: 903
RAW-DNG: 34
RAW-HDR: 1
RAW-RAW: 1
TGA: 157
TIFF: 340
WEBP: 2817
WMF: 6
XCF: 97
XPM: 10
total: 1440787
: 
Videos: 
3GP: 223
AVI: 1591
MOV: 2395
MP4: 5994
MPEG: 55
VOB: 47
WMV: 240
total: 10545
: 
Audio: 
AAC: 85
M4A: 150
MP3: 17302
MP4: 3
MPC: 7
OGG: 340
WAV: 1313
WMA: 6
total: 19206
: 
Total Items: 1470538
Albums: 86485
Tags: 2618
: 
Database backend: QSQLITE
Database Path: F:/digikam_db/
Database locale: UTF-8
Comment 3 caulier.gilles 2023-10-15 03:07:38 UTC
@Michael,

This problem still reproducible with the new digiKam 8.2.0 pre-release Windows
installer available at usual place:

https://files.kde.org/digikam/

This new bundle is based on last Qt framework 5.15.11 and KDE framework 5.110.

Thanks in advance

Gilles Caulier
Comment 4 Michael 2023-11-17 17:31:57 UTC
I have tested the current dev version as you suggest. I created two albums, album 1 and 2, and used three images, A, B and C.

The albums contained the following constellations:

1: A B B C
2: A B C C

The configuration of the search as shown in the screenshot.

The result was 3 sets of duplicates.

Set 1:
Album 1: A (Reference)
Album 2: A

Set 2:
Album 1: B (Reference) B <- I don't want this
Album 2: B

 Set 3:
Album 1: C (Reference)
Album 2: C C

Clicking on "Remove duplicates" will also remove images from album 1. However, my aim is that only images from album 2 are removed.

The expected result should be the following 3 sets of duplicates.

Set 1:
Album 1: A (Reference)
Album 2: A

Set 2:
Album 1: B (Reference)
Album 2: B

 Set 3:
Album 1: C (Reference)
Album 2: C C
Comment 5 Michael 2023-11-17 17:33:32 UTC
Created attachment 163248 [details]
Config for finding duplicates
Comment 6 caulier.gilles 2025-04-11 18:13:43 UTC
Hi,

The 8.7.0 pre-release Windows installer from today have been rebuilt from
scratch with Qt 6.8.3, KDE 6.12, OpenCV 4.11 + CUDA support, Exiv2 0.28.5, ExifTool 13.27, ffmpeg 7, all image codecs updated to last version (jxl, avif, heif, aom, etc.).

Please try with this version to see if your problem still reproducible...

https://files.kde.org/digikam/

Thanks in advance
Best regards

Gilles Caulier