Bug 302923

Summary: Sort Icon-view items found by Fuzzy Searches [patch]
Product: [Applications] digikam Reporter: julien.t43+kde
Component: Searches-SimilarityAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: wishlist CC: caulier.gilles, laurakittyinka, mario.frank
Priority: NOR    
Version: 2.5.0   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In: 5.4.0
Attachments: Patch for sorting by similarity and dropping images not yet included in the DK database.

Description julien.t43+kde 2012-07-02 20:57:15 UTC
While ordering/classificating a big number of pictures, I asked myself to use similarity check to avoid selecting too similar pictures.
Problem: it seems that for now, you can only use similarity from a reference file.

It would be nice, in album/search view (maybe other applicable), to be able to order by similarity or group by similarity.

While looking on the net, found those links
	http://forums.adobe.com/message/3650978
	http://xapian.wordpress.com/2009/03/11/xappy-now-supports-image-similarity-searching/
	http://sourceforge.net/tracker/?func=detail&aid=3420654&group_id=42641&atid=433763


Reproducible: Always
Comment 1 Mario Frank 2016-11-14 08:43:51 UTC
I am not sure to what extent sorting should be possible.
In fact, this feature request is not detailled enough. All (virtual) albums can be sorted by name, path,..., size . Introducing an ordering by similarity to the specific album is not that complex. But if the ordering is set to by size in the main menu, the ordering by similarity is revoked and cannot be reconstructed easily. One could add a sorting option by relevance, but this should then be located in the main menu, too. But then, all pictures should be sortable by some similarity to some other picture which only does make sense, if the database is extended with a relation table originalPicture | similarPicture | similarity.
And this table would have to be updated everytime an image is moved to trash, moved back from trash, imported etc. In fact, the db would become much bigger than it is currently since every picture has some similarity to another picture which is a quadratic complexity. All the updates would also have visible impact to the performance. Moreover, the usual sorting mechanisms work on the properties of the specific image. Thus, using the sorting infrastructure currently available looks more like a hack to me.

For fuzzy search on an image or sketch in special, the grouping by album is contra-productive since the ordering by similarity would be partially revoked.

To conclude:
I am not sure which way to go here. As I see it, sorting by similarity is best located in duplicates/fuzzy/sketch search where the impact on the database is smaller. Also, there should be an option to sort by similarity in the main menu but this option should only be visible when the focus is set to the duplicates/fuzzy/sketch searches view.

Any comments/ideas/hints from the devs?
Comment 2 Barbara Scheffner 2016-11-14 09:54:55 UTC
If I may say something to this issue although I'm only ;-) a documenter: If sorting by similarity is located under scetch and not under Duplicates or Image would not really help to make the GUI more intuitive.
And from Mario's comment I feel that this would be quite a lot of work. I understand Julien: if you have to work with a big number of pics this could be a good help. But - sorry to say that - I feel that there are much more important issues to solve which probably would also help Julien.
Comment 3 Mario Frank 2016-11-14 10:59:26 UTC
Hey Wolfgang (if I may),

I fear that my comment induced a quite too fatalistic view. Thus, I want to make that more precise. Depending on the way of implementation, the effort may be smaller or higher. Communicating the similarities to the album view is not that hard and would not take much time. But before doing some work in that point, some design decision must be found. As noted, introducing the ordering is not hard.
I just want to know whether the devs accept introducing an ordering mechanism which does not use the database but the search query for example. This would not have such impact to the database size.
Comment 4 caulier.gilles 2016-11-14 11:11:58 UTC
>Communicating the similarities to the album view is not that hard and would not >take much time. But before doing some work in that point, some design decision >must be found.

yes i think it's simple...

>I just want to know whether the devs accept introducing an ordering mechanism >which does not use the database but the search query for example. This would not >have such impact to the database size.

Well, i think the icon-view model must be updated to take extra values as in case the similarity percents. This value will become a propriety from each icon that it will easy to compare with other icon and perform a sort of item, as it's already do with others properties as date, or file name.

Note that icon view has a menu setting to change the current order rules. In case of duplicate virtual search album, we must to have a way to switch between usual sort option and new one based on similarity.

Of course, this similarity sort option must be disabled when we are on non Duplicates search album.

Gilles Caulier
Comment 5 Barbara Scheffner 2016-11-14 18:04:19 UTC
Hey Mario,
sure you may. I cannot really judge about the amount of work since I'm not a coder. And my second point is possibly obsolete because I now see that I probably misunderstood something. So forget it, at least for now!
Comment 6 Mario Frank 2016-12-20 15:48:16 UTC
After a longish phase on crawling the code and testing,
I do have a working solution.

Sorting by similarity is quite easy. The only missing thing for this feature was the similarity property, i.e. the similarity of a found picture to the original one. This similarity was printed during duplicates/fuzzy search but there is neither a field in the ImageInfo nor in the properties. Thus I extended the ImageProperties.

Since my current staged changes also concern Bug https://bugs.kde.org/show_bug.cgi?id=320666 , I will describe my solution in detail for both bugs here.

It will be quite technical since I want to make clear why I did what.
I can also upload a patch for review if wished.

So here is the description:

In short:
For sorting by similarity, a new option Sort by similarity with default order descending was introduced. This option is only active if the fuzzy search sidebar is active, i.e. fuzzy/duplicates/sketch search.
To compare two pictures, it is necessary to get their similarity to the original picture. Since this information was only printed to console but not saved in the image info, SAlbum query or the properties,
I do that myself. For every similar picture, a property similarityTo_X with X being the id of the original image and the similarity are stored as image property. When the image with id X is deleted, the property
is removed to keep the DB small. Also, the property is removed from images if the fuzzy/duplicates search is done for image X such that old similarity values are removed. Though the similarity of image X to image Y
is symmetric, I do not want to store both. This would bloat the DB. Only the detected similarities are explicitly stored.
This way, Sorting works in fuzzy, duplicates and sketch search.


- The DigikamApp is extended with the new QAction sort by similarity which is disabled on start and only is enabled if the fuzzy search sidebar is active.

- The DigikamView is signalled by the FuzzySearchSidebarWidget with the signal signalActive(bool) and the DigikamView triggers the slot slotFuzzySidebarActive(bool).
  This signal is forwarded to the DigikamApp where setEnabled(bool) of the QAction (sort by similarity) is called.
  Also, the DigikamView is signalled by the FuzzySearchSidebarWidget with the signal signalImageChanged() and the DigikamView triggers the slot slotUpdateFuzzyReferenceImage().
  Here, the selected reference image is loaded from the application settings and set in the ImageFilterModel.
  The slot slotSortImages of the DigikamView is extended in a way that the reference image for sorting is set in the ImageFilterModel if the sort role is SortBySimilarity.

- The SearchModificationHelper is extended with a method  createFuzzySearchFromDropped and a slot slotCreateFuzzySearchFromDropped which gets the path of the image file as parameter. 
  The method generates a new SAlbum query with the new type image and sketchtype scanned and sets the file path as value of the query.

- The LeftSidebarWidget is extended with a slot and signal ImageChanged() and a signal signalActive(bool). The signal signalActive is emitted every time, setActive(bool) is called. 
- The slot slotImageChanged is triggered by a signal signalReferenceImageSelected() from the FuzzySearchView. This slot forwards the signal by emitting signalImageChanged().

- The slot slotImagesDeleted of the AlbumManager is extended. Here, the property “similarityTo_”imageid is deleted for every deleted image. Meaning, all similarity connections to the image to delete
  are removed from database.

- The CoreDb is extended with a method removeImagePropertyByName which does exactly that. All ImageProperties that have the given name are deleted.

- The ImageExtendedProperties are extended with a method similarityTo which returns the similarity (double value) of the image to the image given by the parameter.
  Also, a setter setSimilarityTo and a deletion method removeSimilarityTo were implemented.

- The ImageInfo is extended with a function similarityTo that gets the similarity from the ImageExtendedProperties.

- HaarIface is extended with a method bestMatchesForImageWithThreshold which gets the file path together with the similarity thresholds, generates a QImage with the existing method 
  loadQImage, generates the signature of the image and starts a fuzzy search for this temporary image with the temporary image id -1.
  The method bestMatchesWithThreshold now stores the similarity to the original image for every found image as property in the database.

- The method listHaarSearch from the ImageLister is extended and triggers bestMatchesForImageWithThreshold if the search query has the type image.

- The ImageFilterModel is extended with a method setReferenceImageId which sets the id of the image for that a fuzzy search is done in the sorter.

- The ImageSortSettings are extended with the new SortRole, a field for the reference image id and a setter for this field. 
 Moreover, the default sort order for sort by similarity is set to descending in the method defaultSortOrderForCategorizationMode. 
 Also, the compare method is extended with a case for sort by similarity where the similarity to the reference image given by the id is used for comparison.
 The watch flags for sort by similarity are set to DatabaseFields::Name. I am not sure whether there is a better solution. There is nothing to watch for, is it?

- The ApplicationSettings are extended with a setter and getter for the reference image id for fuzzy search.

- The slot slotDuplicatesAlbumActived of the FindDuplicatesView sets the reference image id which is the name of the SAlbum. This way, sorting by similarity is possible for every album
  in the duplicates view. But if multiple SAlbums are selected in the FindDuplicatesView, the reference image id is the one of the first selected SAlbum. 
  I do not see a better solution. Does Anyone else?

- The FuzzySearchView is extended in the following way. For dropping external images, a new field (a QURL) was introduced which is set during the drop action. 
  An external image can only be dragged into the image label if the mime type that Qt gets is a URL which leads to a local file that can be read by Qt as image.
  If these restrictions are confirmed, the dropping of the image is allowed. Then, the URL is set in the FuzzySearchView, a QImage and a temporary ImageInfo are generated. 
  By setting the URL, it is possible to refresh the similar pictures when the thresholds are modified (via slotTimerImageDone).
  A temporary thumbnail is generated and the file name and file path are set in the view. 
  In order to be able to sort by similarity, the temporary image id -1 is used which cannot be existent in database.
  Also, setCurrentImage is extended to set the reference image id in the ApplicationSettings and emit the signal that the reference image was changed.
Comment 7 caulier.gilles 2016-12-20 16:58:20 UTC
The solution is technically well explained. This can work, but changes are very important every where and this need to be introduced and tested while release stage 5.4.0 and 5.5.0 for exemple, if code is ready to review.

I propose :

1/ Wait 5.4.0 release before to introduce changes about this topic. 5.4.0 is planed for next week.

2/ Create a git branch to host digiKam core with all you changes.

3/ Post all tests to perform in user space to review the new features with expected results. This must be done here in first stage.

4/ Synchronize step by step your branch with git master.

5/ Depending of feedback, when your code is enough stable and ready for production, merge back your branch to git master.

6/ Post a message in users mailing list to have a final feedback before the release.

7/ Release new digiKam with your changes.

What do you think about ?

Gilles Caulier
Comment 8 Mario Frank 2016-12-20 17:49:00 UTC
(In reply to caulier.gilles from comment #7)
> The solution is technically well explained. This can work, but changes are
> very important every where and this need to be introduced and tested while
> release stage 5.4.0 and 5.5.0 for exemple, if code is ready to review.
> 
> I propose :
> 
> 1/ Wait 5.4.0 release before to introduce changes about this topic. 5.4.0 is
> planed for next week.
> 
> 2/ Create a git branch to host digiKam core with all you changes.
> 
> 3/ Post all tests to perform in user space to review the new features with
> expected results. This must be done here in first stage.
> 
> 4/ Synchronize step by step your branch with git master.
> 
> 5/ Depending of feedback, when your code is enough stable and ready for
> production, merge back your branch to git master.
> 
> 6/ Post a message in users mailing list to have a final feedback before the
> release.
> 
> 7/ Release new digiKam with your changes.
> 
> What do you think about ?
> 
> Gilles Caulier

Hi Gilles,

The patch is finished since more or less two weeks. I hoped to get the functionality in 5.4. 
But due to many changes it took me some time to merge my changes in my working directory of the master branch and polish the code.
I did not create explicit test cases. I still have to dig through the testing facility of DK.

I tested the functionality only by hand, i.e.
- reviewing the DB changes introduced by the new property
- reviewing ApplicationSettings changes 
- confirming the sort order in DK in corellation to the DB property entries

When was the feature freeze or better, when is the feature freeze usually?

Perhaps someone else has some time to test my changes, too. Thus, I'll just upload the (git diff) patch.

Cheers,
Mario
Comment 9 Mario Frank 2016-12-20 17:49:53 UTC
Created attachment 102904 [details]
Patch for sorting by similarity and dropping images not yet included in the DK database.
Comment 10 caulier.gilles 2016-12-20 18:00:38 UTC
The 5.4.0 is planed to 27 December 2016. 
5.5.0 will be planed in February 2017. We will have one month to review your new code.
Before 5.4.0 release date, we need to lets to translators teams to work a little bit without to much changes in source code, at least in GUI side.
Code must be checked also under Windows and MacOS. This take time. So it's better to patch git/master after 5.4.0 release.

I will test your patch soon.

Gilles
Comment 11 Mario Frank 2016-12-23 21:06:37 UTC
Git commit e12dd0980f4371f1a7511a817fc7e17d7dccd2e6 by Mario Frank.
Committed on 23/12/2016 at 20:45.
Pushed by mfrank into branch 'master'.

This patch aims to improve the usability of fuzzy/duplicates/sketch search.
A sorting option (by similarity) is introduced which can be helpful in fuzzy/sketch search where similar images are searched for a given one. Sorting the resulting images by similarity to the given image is an obviously helpful feature.
It is now possible to drop local files that are images and not present in the Digikam database into the fuzzy search image label and trigger a fuzzy search for this image.

Details:

Sort by similarity:
For sorting by similarity, a new option Sort by similarity with default order descending was introduced. This option is only active if the fuzzy search sidebar is active, i.e. fuzzy/duplicates/sketch search.
To compare two pictures, it is necessary to get their similarity to the original picture. Since this information was only printed to console but not saved in the image info, SAlbum query or the properties,
I do that myself. For every similar picture, a property similarityTo_X with X being the id of the original image and the similarity are stored as image property. When the image with id X is deleted, the property
is removed to keep the DB small. Also, the property is removed from images if the fuzzy/duplicates search is done for image X such that old similarity values are removed. Though the similarity of image X to image Y
is symmetric, I do not want to store both. This would bloat the DB. Only the detected similarities are explicitly stored.

Drag and drop images not yet present in DB for fuzzy searches:
I allow dragging objects over the image label iff the mime type of the event data is a URL list. In this case, it is no DIMG. If the first URL (there is only one in the list) is a local file and can be loaded as QImage, dropping is allowed. The function
loadImage in HaarIface is used since it was already present and has optimisations for specific image formats. In order to not import the image but to be able to find similar pictures, and sort by the similarity, a temporary image id is needed.
I use -1 for this case since this image id cannot be present in DB. Also, everytime a new external image is dropped for fuzzy search, the similarities for the previous dropped image are removed from database to keep the entries up to date.
Moreover, I save the image url of the dropped image in the fuzzy search view. This way, refreshing the similar images due to modification of thresholds does work, too.

Made small tests with valgrind - did not find memory leaks, yet.
Made functionality tests for fuzzy drop. Trying to drag local files that are no images (e.g. pdf, odt, plain text) is not successful as wished.
Made functionality tests for similarity search. Did not find defects.
More tests will be applied in the next days.
Related: bug 320666
FIXED-IN: 5.4.0

M  +3    -1    NEWS
M  +7    -0    app/main/digikamapp.cpp
M  +46   -0    app/utils/searchmodificationhelper.cpp
M  +29   -0    app/utils/searchmodificationhelper.h
M  +27   -0    app/views/digikamview.cpp
M  +4    -0    app/views/digikamview.h
M  +9    -0    app/views/leftsidebarwidgets.cpp
M  +8    -0    app/views/leftsidebarwidgets.h
M  +6    -0    libs/album/albummanager.cpp
M  +6    -0    libs/database/coredb/coredb.cpp
M  +1    -0    libs/database/coredb/coredb.h
M  +28   -1    libs/database/haar/haariface.cpp
M  +14   -2    libs/database/haar/haariface.h
M  +25   -0    libs/database/item/imageextendedproperties.cpp
M  +7    -0    libs/database/item/imageextendedproperties.h
M  +5    -0    libs/database/item/imageinfo.cpp
M  +2    -0    libs/database/item/imageinfo.h
M  +13   -0    libs/database/item/imagelister.cpp
M  +6    -0    libs/models/imagefiltermodel.cpp
M  +1    -0    libs/models/imagefiltermodel.h
M  +26   -1    libs/models/imagesortsettings.cpp
M  +4    -1    libs/models/imagesortsettings.h
M  +2    -0    libs/settings/applicationsettings.cpp
M  +3    -0    libs/settings/applicationsettings.h
M  +9    -0    libs/settings/applicationsettings_miscs.cpp
M  +4    -0    libs/settings/applicationsettings_p.cpp
M  +2    -0    libs/settings/applicationsettings_p.h
M  +11   -0    utilities/fuzzysearch/findduplicatesview.cpp
M  +73   -0    utilities/fuzzysearch/fuzzysearchview.cpp
M  +3    -0    utilities/fuzzysearch/fuzzysearchview.h

https://commits.kde.org/digikam/e12dd0980f4371f1a7511a817fc7e17d7dccd2e6
Comment 12 Mario Frank 2017-01-06 18:38:28 UTC
Git commit 429fa5fd8e7f53b74c82eb19dffb2e6cf4b4325c by Mario Frank.
Committed on 06/01/2017 at 18:36.
Pushed by mfrank into branch 'master'.

This patch is joint work with Simon. The patch introduces the similarity of images to a specific image
as column in table view. Also, it fixes the sorting by similarity in sketch search.
Moreover, the dysfunctional context menu item "Find duplicates" in people sidebar now leads to the selection
of the people tags as regular tabs in duplicates search. Finally, selecting multiple regular/people tags for
duplicates search is possible. Not to mention some refactoring to make function names more fitting.
Related: bug 374191, bug 320666
FIXED-IN: 5.4.0

M  +1    -1    NEWS
M  +21   -30   app/views/digikamview.cpp
M  +5    -4    app/views/digikamview.h
M  +17   -15   app/views/leftsidebarwidgets.cpp
M  +6    -8    app/views/leftsidebarwidgets.h
M  +34   -2    app/views/tableview/tableview_column_item.cpp
M  +2    -1    app/views/tableview/tableview_column_item.h
M  +1    -1    libs/album/albumselectiontreeview.cpp
M  +1    -1    libs/album/albumselectiontreeview.h
M  +5    -0    libs/album/albumtreeview.cpp
M  +2    -1    libs/album/albumtreeview.h
M  +10   -1    libs/database/dbjobs/dbjob.cpp
M  +45   -30   libs/database/haar/haariface.cpp
M  +5    -5    libs/database/haar/haariface.h
M  +14   -0    libs/database/item/imageinfo.cpp
M  +5    -0    libs/database/item/imageinfo.h
M  +2    -0    libs/database/item/imageinfodata.h
M  +106  -6    libs/database/item/imagelister.cpp
M  +8    -1    libs/database/item/imagelister.h
M  +10   -6    libs/database/item/imagelisterrecord.h
M  +0    -6    libs/models/imagefiltermodel.cpp
M  +0    -1    libs/models/imagefiltermodel.h
M  +5    -10   libs/models/imagesortsettings.cpp
M  +0    -2    libs/models/imagesortsettings.h
M  +0    -2    libs/settings/applicationsettings.cpp
M  +0    -3    libs/settings/applicationsettings.h
M  +0    -10   libs/settings/applicationsettings_miscs.cpp
M  +0    -4    libs/settings/applicationsettings_p.cpp
M  +0    -2    libs/settings/applicationsettings_p.h
M  +3    -1    libs/tags/tagfolderview.cpp
M  +1    -1    libs/tags/tagfolderview.h
M  +3    -3    libs/tags/tagsmanager/tagsmanager.cpp
M  +16   -16   utilities/fuzzysearch/findduplicatesview.cpp
M  +5    -2    utilities/fuzzysearch/findduplicatesview.h
M  +22   -20   utilities/fuzzysearch/fuzzysearchview.cpp
M  +5    -4    utilities/fuzzysearch/fuzzysearchview.h

https://commits.kde.org/digikam/429fa5fd8e7f53b74c82eb19dffb2e6cf4b4325c