Bug 407049 - Import fails to detect duplicate photos
Summary: Import fails to detect duplicate photos
Status: REPORTED
Alias: None
Product: digikam
Classification: Applications
Component: Import-MainView (show other bugs)
Version: 7.0.0
Platform: Microsoft Windows Microsoft Windows
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-29 12:15 UTC by Martynas Brijunas
Modified: 2023-10-15 12:35 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Fail case where only 2 of 3 duplicates were found. (71.14 KB, image/png)
2020-06-14 02:23 UTC, klaus
Details
Success case where image search found all 3 duplicates. (119.57 KB, image/png)
2020-06-14 02:27 UTC, klaus
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martynas Brijunas 2019-04-29 12:15:29 UTC
SUMMARY

In digiKam 6.1.0 the duplicate detection fails under certain scenarios.

STEPS TO REPRODUCE
1. Create a new album from an existing directory structure of photos.
2. Run thumbnail and fingerprint creation routines.
3. Copy one of the directories from within the album to another location.
4. Run "Import", point it to the copy that was created in step 3 and choose "Download New", then point it to a test directory within the album directory tree.

OBSERVED RESULT

Despite the photos already being present in the album, the entire set gets imported into the test directory within the album. However, any subsequent attempts to re-import the same set do not add any additional photos (which is correct). It seems that digiKam only sees duplicates upon the import if it was the tool that was used to import them initially. This is not the case when migrating to digiKam from other tools, such as Picasa.

EXPECTED RESULT

It should have detected duplicates, especially since the fingerprint database has been populated.

SOFTWARE/OS VERSIONS
Windows: 10
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 Maik Qualmann 2019-04-29 12:26:36 UTC
DigiKam does not check on the fingerprint if the image already exists. For images that are not yet imported there is no information in the DB. Digikam uses a mix of camera device or file path, file name, file size and date to recognize a file when importing.

Maik
Comment 2 Martynas Brijunas 2019-04-29 12:42:57 UTC
It sounds as if it might be the best practice to start with an empty album and "import" the entire old directory structure to "familiarize" digiKam with its content.
Comment 3 Maik Qualmann 2019-04-29 13:13:46 UTC
I do not know if you're starting with digiKam. But you do not need to import images. As an example the images are under Windows in the pictures directory. Then you select this folder as a lokal collection. DigiKam automatically scans this folder and subfolder and builds up the album structure. DigiKam uses the folder structure 1:1.

Maik
Comment 4 Martynas Brijunas 2019-04-29 17:04:58 UTC
(In reply to Maik Qualmann from comment #3)
> I do not know if you're starting with digiKam. But you do not need to import
> images. As an example the images are under Windows in the pictures
> directory. Then you select this folder as a lokal collection. DigiKam
> automatically scans this folder and subfolder and builds up the album
> structure. DigiKam uses the folder structure 1:1.

My album was created as you have described above. The issue I am trying to address is this. Photos from my mobile phone get copied via OneDrive app to my desktop. The directory is called "Camera roll" and it is always appended. As a result I have a mix of "old" (already imported - but not by digiKam) photos and "new" photos. I would like to be able to keep importing from the "Camera roll" directory without worrying about duplicates.

The scenario becomes more complex if there is a dedicated "Camera roll" per device (each family member).
Comment 5 Daniel 2020-01-19 15:49:02 UTC
I really would want to see the fuzzy search to detect duplicates. I have multiple backups of all kinds of sd cards, phone DCIM folders etc. When I import them in digikam to ensure that they are in the collection, they get added again.
Comment 6 Daniel 2020-01-24 22:21:40 UTC
Also, importing via MTP doesn't work properly (not a digikam issue), but I copy the files using adb instead. However, duplicate photos aren't detected when importing. Maybe it would be possible to use the fuzzy search algorithm if importing from a local storage?
Comment 7 klaus 2020-06-14 02:23:18 UTC
Created attachment 129333 [details]
Fail case where only 2 of 3 duplicates were found.

This search includes 2 separate folders.  One folder contains the original. The other folder ("testSearch") contains 2 duplicates of the original.  The search only found the duplicates within one folder.  The search unfortunately did not find duplicates among 2 separate folders.
Comment 8 klaus 2020-06-14 02:27:01 UTC
Created attachment 129334 [details]
Success case where image search found all 3 duplicates.

Here you can see in the Image search tab, the digiKam was able to find all 3 duplicates that exist among 2 separate folders.  This kind of functionality was expected in the Duplicates search, where it only found 2 of 3 duplicates.
Comment 9 klaus 2020-06-14 02:49:37 UTC
Eureka! I just found out that my Similarity min/max range of 100/100 was too strict.  

When I change the similarity range to 99/100, the search found all my duplicates!

Hopefully this helps!
Comment 10 klaus 2020-06-14 03:08:30 UTC
One more comment - There still seems to be some room for improvement with this Similarity/Duplicates Search. Even with a range of 99/100 the system couldn't find a file that is a straight duplicate in 2 separate folders.  Same filename, same everything (except minor metadata difference post copy operation, maybe).

When I set the range to 98/100 the image duplicate appears.

This file is an exact copy so it should appear with 100/100 range.  I can move forward with this, but hopefully it can still be fixed.

Great work btw, thank you very much!
Comment 11 Maik Qualmann 2020-06-14 07:18:56 UTC
I can not confirm. Identical images can be found with 100% setting in different albums.

Maik
Comment 12 klaus 2020-06-14 07:44:29 UTC
Maik, I can agree, right now it does appear to be working with 100% Similarity among albums/folders I'm working with now.  Strange how my initial test case was clearly not working.  Hopefully I can find something more reproducible.
I forgot to mention I'm using digiKam 6.4.0 Windows 10 64bit.

Thanks for your time and effort,
Klaus
Comment 13 Maik Qualmann 2020-06-14 08:42:01 UTC
Use the digiKam-7.0.0-RC version. The release will be in July, over 700 bugs have been fixed and we need fresh feedback.

https://files.kde.org/digikam/

Maik
Comment 14 Martynas Brijunas 2020-06-14 09:14:02 UTC
(In reply to Maik Qualmann from comment #13)
> Use the digiKam-7.0.0-RC version. The release will be in July, over 700 bugs
> have been fixed and we need fresh feedback.
> 
> https://files.kde.org/digikam/
> 
> Maik

Downloading now and will test. Out of curiosity - does digiKam use an image "signature", which would be independent of any file attributes or image metadata? I have seen this implemented in ImageMagic.
Comment 15 Martynas Brijunas 2020-06-22 19:03:40 UTC
Unfortunately, 7.0.0-RC seems to still have the same problem. The steps that I described on 2019-04-29 for version 6.1.0 still result in an incorrect behaviour. Photos are recognised as existing only if I allow digiKam to import them. In other words, on first attempt they are not seen as existing, on the second attempt they are.
Comment 16 Maik Qualmann 2020-06-22 21:21:35 UTC
If you add new images, you have to update the fingerprints. Did you do this? And yes, it is a signature of the image data, not the metadata.

Maik
Comment 17 Martynas Brijunas 2020-06-23 06:38:38 UTC
My test is based on the knowledge (manually checking) that the photos already exist and have been found by digiKam. By "found" I mean digiKam has run "Scan for new items" and generated hashes. I have also run "Similarity > Update fingerprints".

With both hashes and fingerprints present in the DB, once a subset of existing files is offered for an import, and "Download New" is selected, all of the already existing images are added to digiKam for the second time. However a subsequent attempt to do that correctly refuses the import.
Comment 18 Maik Qualmann 2020-06-23 06:49:41 UTC
Again, the digiKam import tool does not use the fingerprint signatures. There are several reasons for this. Depending on the device used, we may not have the complete image data available when importing. And some devices are quite slow (PTP via gPhoto2). We use a mix of device, file name, file size and file date.

Maik
Comment 19 Martynas Brijunas 2020-06-23 07:07:42 UTC
@Maik - without trying to add gas to the fire - I am merely reporting an issue from a user perspective, without fully knowing what goes under the hood. It is a genuine problem when trying to reconcile several directories that may contain duplicates.

What remedy would you suggest? I honestly don't know what else I can do, short of writing my own python script or something. I was hoping that digiKam can help me here, and reported the issue in good faith.
Comment 20 Maik Qualmann 2020-06-23 07:26:43 UTC
Everything is fine, it's just an explanation that "Download New" in the import tool has nothing to do with the similarity search at the moment. If you add new images you have to do this first. Then update the fingerprints. Now you can use the similarity search to determine whether the images are duplicated.

Maik
Comment 21 Martynas Brijunas 2020-06-23 14:10:33 UTC
I think I have found one important detail. I looked at the DB to see how digiKam sees the files it chooses to import despite them already present in one of the albums. It turns out that the existing file and the newly imported file were off by exactly one hour! I suspect it could be to do with the winter/summer time. Here is a screenshot from my database: https://imgur.com/EduTKkR
Comment 22 Maik Qualmann 2020-06-23 16:31:01 UTC
The import tool does not look in the "Images" table, but in the "DownloadHistory" table. A file that was not imported with the import tool will not be recognized. A possible 1 hour difference is calculated in the "DownloadHistory" table.

Maik
Comment 23 Martynas Brijunas 2020-06-23 17:54:41 UTC
@Maik - thank you for clarifying. Perhaps I fundamentally misunderstood how digiKam works (I am mostly used to Picasa). From what you are saying I am getting the impression that there is no way for digiKam to avoid duplicates *until* it has imported images. This may result in some duplicates, but for that we have "Similarity search", so the workflow would be:

1. Import
2. Look for duplicates and cleanup

Is this understanding more or less correct? Thank you, and sorry to be a pain.
Comment 24 Maik Qualmann 2020-06-23 18:29:34 UTC
All fine, yes that's the only way to avoid duplicate image at the moment.

DigiKam has grown over the years, the import tool should only prevent that imported images from the camera's memory card are not re-imported. We are aware that users want to compare with the collection when importing on duplicate images.

Maik
Comment 25 caulier.gilles 2020-08-01 14:10:55 UTC
digiKam 7.0.0 stable release is now published and now available as FlatPak:

https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/

We need a fresh feedback on this file using this version.

Thanks in advance

Gilles Caulier
Comment 26 Daniel 2020-08-03 19:03:26 UTC
Still happens to me on 7.0.0. When importing the same picture from a different hardware medium than originally imported (a file copy, from hdd instead of SDcard, etc), it doesn't mark as already imported.
Comment 27 caulier.gilles 2023-04-29 12:51:13 UTC
Hi all,

digiKam 8.0.0 is out. This entry still valid with this release ?

Best regards

Gilles Caulier
Comment 28 caulier.gilles 2023-10-15 12:35:42 UTC
@Daniel,


This problem still reproducible with the new digiKam 8.2.0 pre-release Windows
installer available at usual place:

https://files.kde.org/digikam/

This new bundle is based on last Qt framework 5.15.11 and KDE framework 5.110.

Thanks in advance

Gilles Caulier