Bug 490793 - Find similar returns the source image in the search
Summary: Find similar returns the source image in the search
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Searches-Similarity (show other bugs)
Version: 8.5.0
Platform: Microsoft Windows Microsoft Windows
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-24 20:06 UTC by Roland
Modified: 2024-07-25 06:02 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In: 8.5.0
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roland 2024-07-24 20:06:13 UTC
***
If you're not sure this is actually a bug, instead post about it at https://discuss.kde.org

If you're reporting a crash, attach a backtrace with debug symbols; see https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***

SUMMARY
It seems that if I select an image in a folder (say, a generic collection Im trying to sort/categorize), and I use the 'find similar' option via right-click, the system will include the source file in the result. This could lead to problems where someone may think this is a duplicate and delete etc.

STEPS TO REPRODUCE
1. Select any album where there is an image that is unique
2. Right click (Windows anyway) on the image and select 'find similar'
3.

OBSERVED RESULT
System will move to the Similarity tab, and load the source image in the Image tab. And in some time depending on the number of images in play, it will return to the main search result window the same image that was sourced.  

EXPECTED RESULT
System should not return the source image as a similar image.

SOFTWARE/OS VERSIONS
Windows: 10
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 Maik Qualmann 2024-07-24 20:15:12 UTC
The search for similar images does not differentiate whether it comes from the internal collection or externally via drag & drop. So you search for a specific image that looks "like this" and it is displayed. You can also drag & drop external images into the image preview of the similarity search. I would actually leave this behavior as it is.

Maik
Comment 2 Maik Qualmann 2024-07-24 21:25:48 UTC
Git commit ef921f043b60cc6b0e0023b3c9a8fb47e6c10deb by Maik Qualmann.
Committed on 24/07/2024 at 21:24.
Pushed by mqualmann into branch 'master'.

draw the "reference image" logo over the thumbnail,
when searching for similar images of collection items
FIXED-IN: 8.5.0

M  +1    -1    NEWS
M  +1    -1    core/libs/database/item/containers/iteminfo.cpp
M  +3    -2    core/libs/database/item/lister/itemlister.h
M  +38   -34   core/libs/database/item/lister/itemlister_salbum.cpp
M  +8    -8    core/libs/database/item/lister/itemlisterrecord.h

https://invent.kde.org/graphics/digikam/-/commit/ef921f043b60cc6b0e0023b3c9a8fb47e6c10deb
Comment 3 Roland 2024-07-24 22:05:05 UTC
Hi Maik. Im not sure I understand the change here. Is the idea that if n=1, it will have 'reference image' superimposed? 

If so, I would suggest this is probably going to be confusing. That is, a > 0 row return on a query means there is a match to the query. And that is how we think about the concept of 'similar' in the real world. If I walk into a clothing store and ask if there are any pants in the store that look similar to mine, the clerk is never going to point at my pants as the response. If we get a positive response, the implication is that there are other matches, not including the reference, in the return.

And the concern with this in the context of Digikam- especially with a multi-user instance, or with aggregations of content- a hit may mean to the user that the image is already cataloged OTHER than the source file. They then delete the similar item vs categorizing it, and now the system has 0 instances of the file.

Wouldnt it make more sense to simply exclude a return if the search engine returns the same file name AND the same path?  It seems far more useful and consistent to be if a find similar call returns 0 if the number of items isnt > 1. Its the same as deduplication; we dont want to see a hit on items without matches that are within the degree of similitude defined.
Comment 4 Maik Qualmann 2024-07-25 06:02:48 UTC
What's the problem if the image from which you started the similarity search is marked as a "reference image" in the view? The similarity search is not primarily used to delete images. In principle, this search is far too imprecise for that. We have the duplicate search for this use case. Perhaps other users also want to see the image they are looking for in the view in order to compare the labels, tags, etc. of the similar images found. The workflow can be completely different for other users.

Maik