Bug 479181 - Wish list: exact-duplicate scan based on 'uniqueHash'
Summary: Wish list: exact-duplicate scan based on 'uniqueHash'
Status: REPORTED
Alias: None
Product: digikam
Classification: Applications
Component: Database-Similarity (other bugs)
Version First Reported In: 8.2.0
Platform: Other Other
: NOR wishlist
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords: efficiency-and-performance, usability
Depends on:
Blocks:
 
Reported: 2023-12-30 04:15 UTC by Otto Hirr
Modified: 2023-12-30 04:51 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Otto Hirr 2023-12-30 04:15:38 UTC
SUMMARY
***
The duplicate scanner should have option to find 'exact' duplicates which could be really fast without fingerprints.
***

The doc, https://docs.digikam.org/en/main_window/similarity_view.html#find-duplicates, states the following:

"... but it will take a long time too as it has to compare every image with any other image."

The 'Images' table has a column, 'uniqueHash', which is the hash of the bits in the file.

Files containing the same bits *should* have the same hashes.

A simple query to find duplicate values in this column would yield a list of unqueHash's that have duplicates, which can then be used in other table(s) to retrieve the containing directory and file names.

This query would be fast compared to the image comparison of similarities.

Selecting this option would then disable the other similarity options on the page.

I suspect that a large number of users simply want to find exact duplicates when loading files that have been stored in various places, maybe under various prior methods the user had implemented for managing their files.

This could present the file exact-duplicates in a similar manner as existing duplicate finder.

It's simply a modification to how those similar/exact-duplicates are found.

Best regards,

.. Otto
Otto Hirr