SUMMARY In digiKam 6.1.0 the duplicate detection fails under certain scenarios. STEPS TO REPRODUCE 1. Create a new album from an existing directory structure of photos. 2. Run thumbnail and fingerprint creation routines. 3. Copy one of the directories from within the album to another location. 4. Run "Import", point it to the copy that was created in step 3 and choose "Download New", then point it to a test directory within the album directory tree. OBSERVED RESULT Despite the photos already being present in the album, the entire set gets imported into the test directory within the album. However, any subsequent attempts to re-import the same set do not add any additional photos (which is correct). It seems that digiKam only sees duplicates upon the import if it was the tool that was used to import them initially. This is not the case when migrating to digiKam from other tools, such as Picasa. EXPECTED RESULT It should have detected duplicates, especially since the fingerprint database has been populated. SOFTWARE/OS VERSIONS Windows: 10 macOS: Linux/KDE Plasma: (available in About System) KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION
DigiKam does not check on the fingerprint if the image already exists. For images that are not yet imported there is no information in the DB. Digikam uses a mix of camera device or file path, file name, file size and date to recognize a file when importing. Maik
It sounds as if it might be the best practice to start with an empty album and "import" the entire old directory structure to "familiarize" digiKam with its content.
I do not know if you're starting with digiKam. But you do not need to import images. As an example the images are under Windows in the pictures directory. Then you select this folder as a lokal collection. DigiKam automatically scans this folder and subfolder and builds up the album structure. DigiKam uses the folder structure 1:1. Maik
(In reply to Maik Qualmann from comment #3) > I do not know if you're starting with digiKam. But you do not need to import > images. As an example the images are under Windows in the pictures > directory. Then you select this folder as a lokal collection. DigiKam > automatically scans this folder and subfolder and builds up the album > structure. DigiKam uses the folder structure 1:1. My album was created as you have described above. The issue I am trying to address is this. Photos from my mobile phone get copied via OneDrive app to my desktop. The directory is called "Camera roll" and it is always appended. As a result I have a mix of "old" (already imported - but not by digiKam) photos and "new" photos. I would like to be able to keep importing from the "Camera roll" directory without worrying about duplicates. The scenario becomes more complex if there is a dedicated "Camera roll" per device (each family member).
I really would want to see the fuzzy search to detect duplicates. I have multiple backups of all kinds of sd cards, phone DCIM folders etc. When I import them in digikam to ensure that they are in the collection, they get added again.
Also, importing via MTP doesn't work properly (not a digikam issue), but I copy the files using adb instead. However, duplicate photos aren't detected when importing. Maybe it would be possible to use the fuzzy search algorithm if importing from a local storage?
Created attachment 129333 [details] Fail case where only 2 of 3 duplicates were found. This search includes 2 separate folders. One folder contains the original. The other folder ("testSearch") contains 2 duplicates of the original. The search only found the duplicates within one folder. The search unfortunately did not find duplicates among 2 separate folders.
Created attachment 129334 [details] Success case where image search found all 3 duplicates. Here you can see in the Image search tab, the digiKam was able to find all 3 duplicates that exist among 2 separate folders. This kind of functionality was expected in the Duplicates search, where it only found 2 of 3 duplicates.
Eureka! I just found out that my Similarity min/max range of 100/100 was too strict. When I change the similarity range to 99/100, the search found all my duplicates! Hopefully this helps!
One more comment - There still seems to be some room for improvement with this Similarity/Duplicates Search. Even with a range of 99/100 the system couldn't find a file that is a straight duplicate in 2 separate folders. Same filename, same everything (except minor metadata difference post copy operation, maybe). When I set the range to 98/100 the image duplicate appears. This file is an exact copy so it should appear with 100/100 range. I can move forward with this, but hopefully it can still be fixed. Great work btw, thank you very much!
I can not confirm. Identical images can be found with 100% setting in different albums. Maik
Maik, I can agree, right now it does appear to be working with 100% Similarity among albums/folders I'm working with now. Strange how my initial test case was clearly not working. Hopefully I can find something more reproducible. I forgot to mention I'm using digiKam 6.4.0 Windows 10 64bit. Thanks for your time and effort, Klaus
Use the digiKam-7.0.0-RC version. The release will be in July, over 700 bugs have been fixed and we need fresh feedback. https://files.kde.org/digikam/ Maik
(In reply to Maik Qualmann from comment #13) > Use the digiKam-7.0.0-RC version. The release will be in July, over 700 bugs > have been fixed and we need fresh feedback. > > https://files.kde.org/digikam/ > > Maik Downloading now and will test. Out of curiosity - does digiKam use an image "signature", which would be independent of any file attributes or image metadata? I have seen this implemented in ImageMagic.
Unfortunately, 7.0.0-RC seems to still have the same problem. The steps that I described on 2019-04-29 for version 6.1.0 still result in an incorrect behaviour. Photos are recognised as existing only if I allow digiKam to import them. In other words, on first attempt they are not seen as existing, on the second attempt they are.
If you add new images, you have to update the fingerprints. Did you do this? And yes, it is a signature of the image data, not the metadata. Maik
My test is based on the knowledge (manually checking) that the photos already exist and have been found by digiKam. By "found" I mean digiKam has run "Scan for new items" and generated hashes. I have also run "Similarity > Update fingerprints". With both hashes and fingerprints present in the DB, once a subset of existing files is offered for an import, and "Download New" is selected, all of the already existing images are added to digiKam for the second time. However a subsequent attempt to do that correctly refuses the import.
Again, the digiKam import tool does not use the fingerprint signatures. There are several reasons for this. Depending on the device used, we may not have the complete image data available when importing. And some devices are quite slow (PTP via gPhoto2). We use a mix of device, file name, file size and file date. Maik
@Maik - without trying to add gas to the fire - I am merely reporting an issue from a user perspective, without fully knowing what goes under the hood. It is a genuine problem when trying to reconcile several directories that may contain duplicates. What remedy would you suggest? I honestly don't know what else I can do, short of writing my own python script or something. I was hoping that digiKam can help me here, and reported the issue in good faith.
Everything is fine, it's just an explanation that "Download New" in the import tool has nothing to do with the similarity search at the moment. If you add new images you have to do this first. Then update the fingerprints. Now you can use the similarity search to determine whether the images are duplicated. Maik
I think I have found one important detail. I looked at the DB to see how digiKam sees the files it chooses to import despite them already present in one of the albums. It turns out that the existing file and the newly imported file were off by exactly one hour! I suspect it could be to do with the winter/summer time. Here is a screenshot from my database: https://imgur.com/EduTKkR
The import tool does not look in the "Images" table, but in the "DownloadHistory" table. A file that was not imported with the import tool will not be recognized. A possible 1 hour difference is calculated in the "DownloadHistory" table. Maik
@Maik - thank you for clarifying. Perhaps I fundamentally misunderstood how digiKam works (I am mostly used to Picasa). From what you are saying I am getting the impression that there is no way for digiKam to avoid duplicates *until* it has imported images. This may result in some duplicates, but for that we have "Similarity search", so the workflow would be: 1. Import 2. Look for duplicates and cleanup Is this understanding more or less correct? Thank you, and sorry to be a pain.
All fine, yes that's the only way to avoid duplicate image at the moment. DigiKam has grown over the years, the import tool should only prevent that imported images from the camera's memory card are not re-imported. We are aware that users want to compare with the collection when importing on duplicate images. Maik
digiKam 7.0.0 stable release is now published and now available as FlatPak: https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/ We need a fresh feedback on this file using this version. Thanks in advance Gilles Caulier
Still happens to me on 7.0.0. When importing the same picture from a different hardware medium than originally imported (a file copy, from hdd instead of SDcard, etc), it doesn't mark as already imported.
Hi all, digiKam 8.0.0 is out. This entry still valid with this release ? Best regards Gilles Caulier
@Daniel, This problem still reproducible with the new digiKam 8.2.0 pre-release Windows installer available at usual place: https://files.kde.org/digikam/ This new bundle is based on last Qt framework 5.15.11 and KDE framework 5.110. Thanks in advance Gilles Caulier