Bug 261831

Summary: Auto-Process results from duplicates search
Product: [Applications] digikam Reporter: matt.muchowski
Component: Searches-SimilarityAssignee: Digikam Developers <digikam-bugs-null>
Status: REPORTED ---    
Severity: wishlist CC: caulier.gilles, ingot, jrr45, mario.frank, mrvoidman, piotergmoter, rene, stievenard.david, sven.schickedanz
Priority: NOR    
Version: 7.2.0   
Target Milestone: ---   
Platform: Unlisted Binaries   
OS: All   
Latest Commit: Version Fixed In:

Description matt.muchowski 2011-01-02 00:58:49 UTC
Version:           1.2.0 (using KDE 4.4.2) 
OS:                Linux

Hi, I think the find duplicates feature on digi-kam is great, but it is so slow to go through my 20,000 photos, almost all of which have duplicates, and delete the duplicates.

So it would be great to have a batch program to delete the duplicates, including a feature to keep the largest of the duplicates.

Reproducible: Didn't try
Comment 1 ingot 2014-11-16 20:54:43 UTC
I would like to request the same feature. Just add a button "auto-delete duplicates". For a more comfortable version one could add criteria which one to keep:

1) largest one as suggested above (highest resolution)
2) latest one
3) keep the one from a prefered folder (hierarchy) list
Comment 2 Mario Frank 2017-02-28 15:12:53 UTC
I use this bug as container for bugs with equivalent wishlist
Comment 3 Mario Frank 2017-02-28 15:13:13 UTC
*** Bug 372378 has been marked as a duplicate of this bug. ***
Comment 4 Mario Frank 2017-02-28 15:14:09 UTC
From https://bugs.kde.org/show_bug.cgi?id=372378 :

 piotergmoter@hotmail.com 2016-11-12 09:24:38 UTC

Digikam has powerfull search funtcion, which finds duplicates in albums. Nice feature to have would be to *do* something after the search with the images. The most obvious action could be: delete the duplicates, but the list could go on to different scenarios. 

This is the real example which occured after I have imported from the mobile camera 1000 photos  which could be imported previously, a years ago. They have different file names of course, so from filesystem point of view they are different. I would like to clean my albums, but in automated way and wonder what is possible.
P.
----

 Wolfgang Scheffner 2016-11-12 18:14:05 UTC

Seems a bit difficult to me. How can an automated process decide which one of two identical images to process (delete or whatever)? Of course you could set the threshold to 100% and then say it doesn't matter, just process one of them. But 1. your search result gets very small with 100% and 2. the process would still need a rule to decide and that will most likely not match everybody's needs.
Comment 5 Zenopheus 2018-12-28 05:29:49 UTC
Similarity detection is great but completely useless for people starting out with DigiKam who have *lots* of duplicates. I bet this is a large majority of people especially if they use PhotoMove to structure there files (is preserves duplicate images).

I don't understand why this is up for debate after 8 years. I have 100K images from merging multiple collections together. Thousands of them are 100% duplicates. It is impossible to delete them manually. I don't care what folder they are in, I just want them gone.

> Seems a bit difficult to me
This isn't difficult. Here's are two possible solutions. Both would be trivial to implement. The last one would at least allow the issue to be addressed externally.

1. Two new similarity options:
[ ] Use largest image as reference image
[ ] Hide reference images from results
Checking these options would allow a person to select all "Ref. Images" on the left and then select all images on the right and then press delete. This doesn't work now because the reference image is displayed on the right side and may or may not be the largest (largest megapixel) image.

2. Allow the user to export the duplicate list of images to a csv file with some meta information and file path. This way I could write a script to delete duplicates myself. I could then choose to remove jpgs instead of cr2 or images with smaller megapixel sizes.
Comment 6 Mario Frank 2019-01-09 08:49:00 UTC
*** Bug 377523 has been marked as a duplicate of this bug. ***
Comment 7 stievenard.david 2020-03-25 08:30:44 UTC
same need for me, starting with Digikam I have a lot of duplicates coming from
- too much phone pictures dumps 
- chat apps saved pictures dumps that are duplicates (with wrong metadatas)

My 2 needs are also :
- auto delete
- choose an album as a prefered a higher priority source
Comment 8 stievenard.david 2020-03-26 06:35:27 UTC
Just found that other bug report https://bugs.kde.org/show_bug.cgi?id=388981 that is not really a duplicate but is very related to this request

I also found an open source project that did the job for me
https://dupeguru.voltaicideas.net/

Hope this can be usefull
Comment 9 stievenard.david 2021-03-30 03:18:00 UTC
*** Bug 430975 has been marked as a duplicate of this bug. ***
Comment 10 stievenard.david 2021-03-30 03:22:22 UTC
 Wolfgang Scheffner 2016-11-12 18:14:05 UTC

Seems a bit difficult to me. How can an automated process decide which one of two identical images to process (delete or whatever)? Of course you could set the threshold to 100% and then say it doesn't matter, just process one of them. But 1. your search result gets very small with 100% and 2. the process would still need a rule to decide and that will most likely not match everybody's needs.


Hi Wolfgang, the dupeguru UI treat this problem by letting the user selecting which folder is the "master" one.