Bug 372217

Summary:	The results of the Duplicates are currently sorted by the image id and sortable by name and count if similars. But it should be sortable by the similarity of the duplicates, too. [patch]
Product:	[Applications] digikam	Reporter:	Mario Frank <mario.frank>
Component:	Searches-Similarity	Assignee:	Digikam Developers <digikam-bugs-null>
Status:	RESOLVED FIXED
Severity:	wishlist	CC:	caulier.gilles, mario.frank
Priority:	NOR
Version:	5.3.0
Target Milestone:	---
Platform:	Compiled Sources
OS:	Linux
Latest Commit:	http://commits.kde.org/digikam/04c4024d4c7d3f03d91ed892286c2de78abeeb37	Version Fixed In:	5.4.0
Sentry Crash Report:
Attachments:	The first patch. The second patch.

Description Mario Frank 2016-11-08 14:23:46 UTC

Created attachment 102120 [details]
The first patch.

When searching for duplicates, the result set is a table with the thumbnail and the count of similar pictures (including the original one). It is possible to sort the rows by either the reference picture (either name or id, I'm not sure here) or the count of entries in this virtual album. Sadly, it is not possible to sort the result by similarity. This patch introduces this functionality. For each reference image, the average similarity (in percent) is calculated for the potential duplicates, excluding the reference image. This way, it is possible to sort the virtual albums by the average similarities of the duplicates in both ascending and descending order. There is still one glitch in the sorting. 

Since the sorting of the items is done by lexicographic order, the ordering of was not correct if the length of the average similarity string differs. This problem was fixed with another patch that introduces arithmetic ordering for this column explicitly. The second patch will be submitted as comment.

The complete commit messages:
"
[PATCH] Extended the duplicates search list view. Now, the average
 similarity of the found duplicates (excluding the original image) is shown as
 table column. Sorting the result set by the average similarity is thus
 possible. To implement this feature, the haariface had to be modified. It
 returns a map of average similarities to a map of image ids to the set of
 similar images instead of the map of image ids to the set of similar images.
 Communicating the average similarity to the search list view was not possible
 via slots and signals and this would have lead to sending a map of image ids
 to average similarities and then distributing the appropriate average
 similarity to the correct FindDuplicateAlbumItem. Instead, the average
 similarity is communicated via the SearchXml-query as a field of the group.
 This way, the correct item gets the correct similarity automatically. The
 evaluation of the new field by an SQL query is surpressed by the introduction
 of noEffect fields which need to have a prefix "noeffect_". So, the log is
 not polluted by unnecessary debug information.
"
and
"
[PATCH] The items in the FindDuplicatesAlbum were sorted by
 lexicographic order which does not make sense for the average similarity
 column (e.g. 100.00 is not correctly sorted). Thus, the less than operator
 was adopted such that for the average similarity column, arithmetic order is
 used. To make the code more stable against regressions due to reordering the
 columns, an enum was introduced.
"

Comment 1 Mario Frank 2016-11-08 14:24:41 UTC

Created attachment 102121 [details]
The second patch.

Comment 2 caulier.gilles 2016-11-10 05:17:42 UTC

Mario,

After to apply your patch from bug #369051, I cannot apply the file patch over source code :

[gilles@localhost core]$ patch -p1 < DIGIKAM_DuplicatesSearch_ResultSet_ArithmeticOrder.patch 
patching file utilities/fuzzysearch/findduplicatesalbumitem.cpp
Hunk #1 FAILED at 64.
Hunk #2 FAILED at 79.                                                                                                                                                                                                                 
Hunk #3 succeeded at 102 (offset -10 lines).                                                                                                                                                                                          
Hunk #4 succeeded at 120 (offset -10 lines).                                                                                                                                                                                          
2 out of 4 hunks FAILED -- saving rejects to file utilities/fuzzysearch/findduplicatesalbumitem.cpp.rej                                                                                                                               
patching file utilities/fuzzysearch/findduplicatesalbumitem.h                                                                                                                                                                         
                                                                                                                                                                                                                                      
Gilles Caulier

Comment 3 caulier.gilles 2016-11-10 05:21:19 UTC

ok forget my previous comment, i forget to apply patch 1 before patch 2...

[gilles@localhost core]$ git reset --hard                                                                                                                                                                                             
HEAD is now at a503172 update                                                                                                                                                                                                         
                                                                                                                                                                                                                                      
[gilles@localhost core]$ patch -p1 < DIGIKAM_DuplicatesSearch_ResultSet_AverageSimilarity.patch                                                                                                                                       
patching file libs/database/haar/haariface.cpp                                                                                                                                                                                        
patching file libs/database/haar/haariface.h                                                                                                                                                                                          
patching file libs/database/item/imagelister.cpp                                                                                                                                                                                      
patching file libs/database/item/imagequerybuilder.cpp                                                                                                                                                                                
patching file utilities/fuzzysearch/findduplicatesalbum.cpp                                                                                                                                                                           
patching file utilities/fuzzysearch/findduplicatesalbumitem.cpp                                                                                                                                                                       
                                                                                                                                                                                                                                      
[gilles@localhost core]$ patch -p1 < DIGIKAM_DuplicatesSearch_ResultSet_ArithmeticOrder.patch 
patching file utilities/fuzzysearch/findduplicatesalbumitem.cpp                                                                                                                                                                       
patching file utilities/fuzzysearch/findduplicatesalbumitem.h                                                                                                                                                                         
                                                                                                                                                                                                                                      
[gilles@localhost core]$

Comment 4 caulier.gilles 2016-11-10 05:37:29 UTC

Git commit 04c4024d4c7d3f03d91ed892286c2de78abeeb37 by Gilles Caulier.
Committed on 10/11/2016 at 05:33.
Pushed by cgilles into branch 'master'.

Apply patches #102120 and #102121 from Mario Frank

102120: Extended the duplicates search list view. Now, the average
similarity of the found duplicates (excluding the original image) is shown as
table column. Sorting the result set by the average similarity is thus
possible. To implement this feature, the haariface had to be modified. It
returns a map of average similarities to a map of image ids to the set of
similar images instead of the map of image ids to the set of similar images.
Communicating the average similarity to the search list view was not possible
via slots and signals and this would have lead to sending a map of image ids
to average similarities and then distributing the appropriate average
similarity to the correct FindDuplicateAlbumItem. Instead, the average
similarity is communicated via the SearchXml-query as a field of the group.
This way, the correct item gets the correct similarity automatically. The
evaluation of the new field by an SQL query is surpressed by the introduction
of noEffect fields which need to have a prefix "noeffect_". So, the log is
not polluted by unnecessary debug information.

102121: The items in the FindDuplicatesAlbum were sorted by
lexicographic order which does not make sense for the average similarity
column (e.g. 100.00 is not correctly sorted). Thus, the less than operator
was adopted such that for the average similarity column, arithmetic order is
used. To make the code more stable against regressions due to reordering the
columns, an enum was introduced.
FIXED-IN: 5.4.0
CCMAIL: frank@uni-potsdam.de

M  +60   -30   libs/database/haar/haariface.cpp
M  +5    -5    libs/database/haar/haariface.h
M  +1    -1    libs/database/item/imagelister.cpp
M  +5    -1    libs/database/item/imagequerybuilder.cpp
M  +3    -2    utilities/fuzzysearch/findduplicatesalbum.cpp
M  +27   -4    utilities/fuzzysearch/findduplicatesalbumitem.cpp
M  +9    -0    utilities/fuzzysearch/findduplicatesalbumitem.h
M  +4    -4    utilities/fuzzysearch/findduplicatesview.cpp
M  +9    -6    utilities/fuzzysearch/fuzzysearchview.cpp

http://commits.kde.org/digikam/04c4024d4c7d3f03d91ed892286c2de78abeeb37