Bug 320666

Summary: Add search of similar images outside digiKam collections [patch]
Product: [Applications] digikam Reporter: Niels <niels.misc>
Component: Searches-SimilarityAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: wishlist CC: caulier.gilles, freisim93, laurakittyinka, mario.frank, parejaobregon
Priority: NOR    
Version: 3.1.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 5.4.0
Sentry Crash Report:
Attachments: Patch for sorting by similarity and dropping images not yet included in the DK database.
Patch for triggering duplicates search for multiple tags with context menu and introducing the similarity value as column in table view..
Patch for introducing the similarity value as column in table view (part of 103203)
imagelister.cpp
Patch working both for SQLite and MySQL

Description Niels 2013-06-03 15:54:28 UTC
Currently I can only dragdrop images from my collections, ie. from inside Digikam. It would be very convenient to be able to search for images from external folders (via Dolphin) or even online from a browser.

A common use case for me is: Do I have this image in my collection? Dropping it in Digikam to get the answer would be great!

Reproducible: Always
Comment 1 parejaobregon 2013-09-01 15:49:19 UTC
This would indeed improve fuzzy searches (by image) a lot!

Consider also the following user scenario: I want to search in my collection for a specific kind of images (for example bridges, sunsets, etc), but I have no clue where to look for. I could make a simple web search for the subject, temporarily download a similar image to the one I'm looking for, and drag and drop it into digikam fuzzy search. This way I have all the bridges in my collection in no time :)

Anyway, thanks for the hard work!
Comment 2 Mario Frank 2016-12-02 16:27:42 UTC
(In reply to parejaobregon from comment #1)
> This would indeed improve fuzzy searches (by image) a lot!
> 
> Consider also the following user scenario: I want to search in my collection
> for a specific kind of images (for example bridges, sunsets, etc), but I
> have no clue where to look for. I could make a simple web search for the
> subject, temporarily download a similar image to the one I'm looking for,
> and drag and drop it into digikam fuzzy search. This way I have all the
> bridges in my collection in no time :)
> 
> Anyway, thanks for the hard work!

Hey,

I think this is a quite nice feature though there is a workaround:
1) create a "staging"-album for those pictures
2) drag the pictures into the album
3) Trigger fuzzy search via context menu.

But as noted out, the shorter was may be more nice.

Thus, I made some research and some tests concerning this feature request. I have a working solution but I  am not satisfied. In DK, the fuzzy and duplicates searches are based on the images as defined by the database structure. It is possible to implement such feature without using the internal data structure. But I assume this is no clean solution. I think the most clean way would be to create a temporary image data structure from the provided URI. I just did not find something appropriate in the API. Perhaps, this does not even exist.
But even if I create temporary image data structures, they should not have the normal image ids since I do not want to waste ids.

Some comments from the established devs concerning the technical details?

Cheers,
Mario
Comment 3 caulier.gilles 2016-12-02 18:56:25 UTC
The possible way to query the database externaly was dropped definitively with 5.0 release : KIO support.

All digiKam KIO slaves have been replaced by a multi-threaded/core interface. KIO is bably portable and make digiKam very difficult to stabilize.

There is no plan to restore KIO.

Another approach can be to export database content with a standardized protocol as Upnp. An old KIPI tool exist by the core library which implement Upnp was never port to Qt5.

My tip : delay all searches about this topic for the moment, until we found a portable, stable, standardized solution.

Gilles Caulier
Comment 4 Mario Frank 2016-12-20 15:48:06 UTC
(In reply to caulier.gilles from comment #3)
> The possible way to query the database externaly was dropped definitively
> with 5.0 release : KIO support.
> 
> All digiKam KIO slaves have been replaced by a multi-threaded/core
> interface. KIO is bably portable and make digiKam very difficult to
> stabilize.
> 
> There is no plan to restore KIO.
> 
> Another approach can be to export database content with a standardized
> protocol as Upnp. An old KIPI tool exist by the core library which implement
> Upnp was never port to Qt5.
> 
> My tip : delay all searches about this topic for the moment, until we found
> a portable, stable, standardized solution.
> 
> Gilles Caulier

Hey,
I do have a working solution. To make a short description. I allow dragging objects over the image label iff the mime type of the event data is a URL list. In this case, it is no DIMG. If the first URL (there is only one in the list) is a local file and can be loaded as QImage, dropping is allowed. There is a function in HaarIface for loading paths as QImages if thy are images. I could reuse it. The only thing I had to do was to introduce a temporary ImageInfo with id -1 and save the image url in the fuzzy search view. This way, refreshing images due to modification of thresholds does work, too.
This should all be independent from KIO, is it?

For details, look at https://bugs.kde.org/show_bug.cgi?id=302923 .
I was working on both bugs at the same time and the sorting of fuzzy drop search results depends on bug id 302923.

Cheers,
Mario
Comment 5 caulier.gilles 2016-12-20 16:48:39 UTC
yes, i agree. This kind of solution can work and will be independent of KIO mechanism.

In fact, D&D is a way to query an application from outside. KIO is another way which permit to implement more complex solution.

D&D is more universal and is standardized for each OS. Look at end of this page for details :

http://doc.qt.io/qt-5/dnd.html
Comment 6 Mario Frank 2016-12-20 17:53:18 UTC
(In reply to caulier.gilles from comment #5)
> yes, i agree. This kind of solution can work and will be independent of KIO
> mechanism.
> 
> In fact, D&D is a way to query an application from outside. KIO is another
> way which permit to implement more complex solution.
> 
> D&D is more universal and is standardized for each OS. Look at end of this
> page for details :
> 
> http://doc.qt.io/qt-5/dnd.html

Okay,

I finished the Drag&Drop-solution two weaks ago or so.
The patch is the same as for https://bugs.kde.org/show_bug.cgi?id=302923 .
In order to have the patch located correctly here, I will upload it here, too.

Cheers,
Mario
Comment 7 Mario Frank 2016-12-20 17:53:47 UTC
Created attachment 102905 [details]
Patch for sorting by similarity and dropping images not yet included in the DK database.
Comment 8 caulier.gilles 2016-12-22 10:35:42 UTC
Mario, 

I postponed digiKam 5.4.0 to 1 January. This will let's time to you to patch git/master and test to stabilize implementation for production while holidays.

Please don't waste time.

Gilles
Comment 9 Mario Frank 2016-12-23 18:43:22 UTC
(In reply to caulier.gilles from comment #8)
> Mario, 
> 
> I postponed digiKam 5.4.0 to 1 January. This will let's time to you to patch
> git/master and test to stabilize implementation for production while
> holidays.
> 
> Please don't waste time.
> 
> Gilles

Hey,

postponing the deadline was not what I aimed at but since I am in Holidays now, there will be enough time to test.
I will patch the master branch after doing some standard tests with valgrind and some more functionality tests.

The similarity sort patch introduces some i18n-related topics. How are the maintainers informed about necessary changes for localisation and/or documentation?

Cheers,
Mario
Comment 10 caulier.gilles 2016-12-23 18:50:08 UTC
I discovered a serious problem to Slideshow and Presentation with some video file to handle. So postponed is the best way.

The doc is not a problem.

For i18n, if patch over GUI is applied quickly and not touched later, it's acceptable. Patching code not relevant of i18n is not a problem of course.

Gilles
Comment 11 Mario Frank 2016-12-23 21:06:37 UTC
Git commit e12dd0980f4371f1a7511a817fc7e17d7dccd2e6 by Mario Frank.
Committed on 23/12/2016 at 20:45.
Pushed by mfrank into branch 'master'.

This patch aims to improve the usability of fuzzy/duplicates/sketch search.
A sorting option (by similarity) is introduced which can be helpful in fuzzy/sketch search where similar images are searched for a given one. Sorting the resulting images by similarity to the given image is an obviously helpful feature.
It is now possible to drop local files that are images and not present in the Digikam database into the fuzzy search image label and trigger a fuzzy search for this image.

Details:

Sort by similarity:
For sorting by similarity, a new option Sort by similarity with default order descending was introduced. This option is only active if the fuzzy search sidebar is active, i.e. fuzzy/duplicates/sketch search.
To compare two pictures, it is necessary to get their similarity to the original picture. Since this information was only printed to console but not saved in the image info, SAlbum query or the properties,
I do that myself. For every similar picture, a property similarityTo_X with X being the id of the original image and the similarity are stored as image property. When the image with id X is deleted, the property
is removed to keep the DB small. Also, the property is removed from images if the fuzzy/duplicates search is done for image X such that old similarity values are removed. Though the similarity of image X to image Y
is symmetric, I do not want to store both. This would bloat the DB. Only the detected similarities are explicitly stored.

Drag and drop images not yet present in DB for fuzzy searches:
I allow dragging objects over the image label iff the mime type of the event data is a URL list. In this case, it is no DIMG. If the first URL (there is only one in the list) is a local file and can be loaded as QImage, dropping is allowed. The function
loadImage in HaarIface is used since it was already present and has optimisations for specific image formats. In order to not import the image but to be able to find similar pictures, and sort by the similarity, a temporary image id is needed.
I use -1 for this case since this image id cannot be present in DB. Also, everytime a new external image is dropped for fuzzy search, the similarities for the previous dropped image are removed from database to keep the entries up to date.
Moreover, I save the image url of the dropped image in the fuzzy search view. This way, refreshing the similar images due to modification of thresholds does work, too.

Made small tests with valgrind - did not find memory leaks, yet.
Made functionality tests for fuzzy drop. Trying to drag local files that are no images (e.g. pdf, odt, plain text) is not successful as wished.
Made functionality tests for similarity search. Did not find defects.
More tests will be applied in the next days.
Related: bug 302923
FIXED-IN: 5.4.0

M  +3    -1    NEWS
M  +7    -0    app/main/digikamapp.cpp
M  +46   -0    app/utils/searchmodificationhelper.cpp
M  +29   -0    app/utils/searchmodificationhelper.h
M  +27   -0    app/views/digikamview.cpp
M  +4    -0    app/views/digikamview.h
M  +9    -0    app/views/leftsidebarwidgets.cpp
M  +8    -0    app/views/leftsidebarwidgets.h
M  +6    -0    libs/album/albummanager.cpp
M  +6    -0    libs/database/coredb/coredb.cpp
M  +1    -0    libs/database/coredb/coredb.h
M  +28   -1    libs/database/haar/haariface.cpp
M  +14   -2    libs/database/haar/haariface.h
M  +25   -0    libs/database/item/imageextendedproperties.cpp
M  +7    -0    libs/database/item/imageextendedproperties.h
M  +5    -0    libs/database/item/imageinfo.cpp
M  +2    -0    libs/database/item/imageinfo.h
M  +13   -0    libs/database/item/imagelister.cpp
M  +6    -0    libs/models/imagefiltermodel.cpp
M  +1    -0    libs/models/imagefiltermodel.h
M  +26   -1    libs/models/imagesortsettings.cpp
M  +4    -1    libs/models/imagesortsettings.h
M  +2    -0    libs/settings/applicationsettings.cpp
M  +3    -0    libs/settings/applicationsettings.h
M  +9    -0    libs/settings/applicationsettings_miscs.cpp
M  +4    -0    libs/settings/applicationsettings_p.cpp
M  +2    -0    libs/settings/applicationsettings_p.h
M  +11   -0    utilities/fuzzysearch/findduplicatesview.cpp
M  +73   -0    utilities/fuzzysearch/fuzzysearchview.cpp
M  +3    -0    utilities/fuzzysearch/fuzzysearchview.h

https://commits.kde.org/digikam/e12dd0980f4371f1a7511a817fc7e17d7dccd2e6
Comment 12 Mario Frank 2016-12-23 21:30:03 UTC
(In reply to caulier.gilles from comment #10)
> I discovered a serious problem to Slideshow and Presentation with some video
> file to handle. So postponed is the best way.
> 
> The doc is not a problem.
> 
> For i18n, if patch over GUI is applied quickly and not touched later, it's
> acceptable. Patching code not relevant of i18n is not a problem of course.
> 
> Gilles

Hey,
after making some more tests, I patched the master branch. There is only one change which has to be localised, i.e. the sort role "By Similarity".
I will make more tests in the next days and see if I can generate automated tests for the test suite.
Comment 13 caulier.gilles 2016-12-23 23:08:35 UTC
Great. it will be perfect.

Gilles
Comment 14 Barbara Scheffner 2016-12-31 11:06:14 UTC
Hey Mario,
I'm just checking out if and how I can update the handbook regarding this patch. I found that there is a column "Avg. similarity" in the left sidebar if you are in the "Duplicates" tab. Average is nice for an overview of the results but I imagine that it could be useful to have individual simlarity values in the Image Area. The way would be probably to add "Show similarity" in the Views/Icons settings. An alternative would be to show the values in Table mode of the Image Area where View/Sort Images/By Similarity doesn't work right now because the sorting criterium here is determined by the title bar of the table.
To the Sketch tab applies the same since here we don't even have the averages.

Cheers
Wolfgang
Comment 15 Mario Frank 2017-01-01 22:39:35 UTC
(In reply to Wolfgang Scheffner from comment #14)
> Hey Mario,
> I'm just checking out if and how I can update the handbook regarding this
> patch. I found that there is a column "Avg. similarity" in the left sidebar
> if you are in the "Duplicates" tab. Average is nice for an overview of the
> results but I imagine that it could be useful to have individual simlarity
> values in the Image Area. The way would be probably to add "Show similarity"
> in the Views/Icons settings. An alternative would be to show the values in
> Table mode of the Image Area where View/Sort Images/By Similarity doesn't
> work right now because the sorting criterium here is determined by the title
> bar of the table.
> To the Sketch tab applies the same since here we don't even have the
> averages.
> 
> Cheers
> Wolfgang

Hey Wolfgang,

I will sleep over that. In fact, the similarity could be a relevant information in all related views (image/duplicates/sketch search). I will think about how to do this the best way. Then, I will open a new bug report.

Cheers,
Mario
Comment 16 Mario Frank 2017-01-02 19:01:08 UTC
Hey Wolfgang,

I think the best spot to give the similarity is in the table view. This way, the similarity can be shown in fuzzy, duplicates and also sketch search. I have a working implementation, where the similarity can be shown and thus sorting by similarity is possible. I will test and harden the implementation tomorrow.
Comment 17 Mario Frank 2017-01-04 23:17:27 UTC
Hey Guys and Ladies,

I implemented the possibility to show the similarity of the pictures in table view, when the fuzzy search sidebar is active. But I had to refactor my similarity sort patch. It was not optimal and I would have had to communicate the reference image id to which the specific image is similar everywhere via signal/slot communication or application settings. I already had that and learned that switching between the tabs in the fuzzy search sidebar leads to instable similarity results.

Thus I took another approach. The current similarity value is stored in the ImageInfo. Getting the correct similarity in the image info was not really complex. The image search, sketch search and extern image drop search are all HAAR searches. Thus, the albums are rebuilt automatically when i get to the specific tab and the similarities can be extracted from the search results.
This way, I only have to store the similarities for duplicates search since one image can be present in different duplicates albums with different similarities.

The new solution is much more robust. Also, sorting by similarity is now also possible in table view and also in the sketch search which was not possible before.

But currently, when when the fuzzy search sidebar is not active, the similarity column in the table view is still shown. But the similarity of the images is 0.0 since no reference image is present.

Also, I polished the code a lot and polished the patch for bug https://bugs.kde.org/show_bug.cgi?id=374191 .

So, concerning the handbook, there would be the following changes:
In Table view, the context menu for activating colums contains the the option similarity in the item section.
The find duplicates option in the context menu in faces sidebar uses the selected face tags as ordinary tags in duplicates search.

I will upload the patch for review.

Cheers,
Mario
Comment 18 Mario Frank 2017-01-04 23:20:01 UTC
Created attachment 103203 [details]
Patch for triggering duplicates search for multiple tags with context menu and introducing the similarity value as column in table view..

Already made some tests. But more testers are better.
Comment 19 Barbara Scheffner 2017-01-06 04:59:08 UTC
(In reply to Mario Frank from comment #18)
> Created attachment 103203 [details]
> Patch for triggering duplicates search for multiple tags with context menu
> and introducing the similarity value as column in table view..
> 
> Already made some tests. But more testers are better.

I'd volunteer to test but I'm working with the AppImage version right now. I guess there is no easy way to apply the patch to that?
Comment 20 Mario Frank 2017-01-06 09:20:44 UTC
(In reply to Wolfgang Scheffner from comment #19)
> (In reply to Mario Frank from comment #18)
> > Created attachment 103203 [details]
> > Patch for triggering duplicates search for multiple tags with context menu
> > and introducing the similarity value as column in table view..
> > 
> > Already made some tests. But more testers are better.
> 
> I'd volunteer to test but I'm working with the AppImage version right now. I
> guess there is no easy way to apply the patch to that?

Hey Wolfgang,

No, there is no easy way to do that.
I looked at the AppImage build information. I would have to create an AppImage and for this, I would have to create a CentOS VM. This would take some time.

Can you compile digikam on your system or do you just use binary versions?
Comment 21 caulier.gilles 2017-01-06 09:24:29 UTC
Mario,

All VM are ready at home. I can do it this week end including patch, not before.

Note : remember that 5.4.0 will be release this Sunday evening.

Gilles
Comment 22 Mario Frank 2017-01-06 09:41:18 UTC
(In reply to caulier.gilles from comment #21)
> Mario,
> 
> All VM are ready at home. I can do it this week end including patch, not
> before.
> 
> Note : remember that 5.4.0 will be release this Sunday evening.
> 
> Gilles

Hey Gilles,
I tested the patch again yesterday evening. The database handling works as expected. No errors were there. The patch reduces the amount of DB usage since the similarity is only stored for duplicates. Fuzzy search, drop search and sketch search do not use similarities from DB any more. Since these HAAR searches are done on thumbnail load anyway which also includes searching for the similar pictures, I just forward the similarities. The current similarity is stored in the ImageInfo but not persisted in the database. Thus, there are no DB changes with this approach. I eliminated the storing of the current fuzzy reference image from application settings since I do not need it any more. Also,
I use less SIGNAL/SLOT communication which reduces the probability of bugs a lot. I do not have to react on context switches anymore. 

The only things that cannot be solved easily:
1) The column Similarity is shown for the table view everywhere if it was selected. But only when there is a similarity, the column entry is non-zero.
So, in every table view that is not in the context of the fuzzy search sidebar,
the images have a similarity of 0.0. I do not see a solution to hide that column dynamically. But since many properties (e.g. geo location) are not set for every picture, I would not see it as a glitch.
2) In sketch search, the similarity values are in fact the similarity scores normalised to a positive value between 0 and 100. It is not a percentage.
Nevertheless, this is no dysfunction but intended for now.

Since the patch also fixes bugs like the context menu problems https://bugs.kde.org/show_bug.cgi?id=374191 , reduces DB usage overhead and polishes my former SIGNAL/SLOT communication, I would like to merge it as soon as possible.

What do you think?
Comment 23 Barbara Scheffner 2017-01-06 10:24:36 UTC
(In reply to Mario Frank from comment #20)
> (In reply to Wolfgang Scheffner from comment #19)
> > (In reply to Mario Frank from comment #18)
> > 
> > I'd volunteer to test but I'm working with the AppImage version right now. I
> > guess there is no easy way to apply the patch to that?
> 
> Hey Wolfgang,
> 
> No, there is no easy way to do that.
> I looked at the AppImage build information. I would have to create an
> AppImage and for this, I would have to create a CentOS VM. This would take
> some time.
> 
> Can you compile digikam on your system or do you just use binary versions?

I can compile digiKam on my system but I don't know how to include your patch. If you can give me instructions for that I can try this afternoon/evening.
Comment 24 Simon 2017-01-06 10:30:28 UTC
I am currently disentangling the changes regarding
https://bugs.kde.org/show_bug.cgi?id=374191 and this bug, because there
is something I would like to suggest on the patch for the former. I will
post the separated pasts as soon as I have them - I hope that is ok.
Comment 25 Mario Frank 2017-01-06 10:37:26 UTC
(In reply to Simon from comment #24)
> I am currently disentangling the changes regarding
> https://bugs.kde.org/show_bug.cgi?id=374191 and this bug, because there
> is something I would like to suggest on the patch for the former. I will
> post the separated pasts as soon as I have them - I hope that is ok.

Simon, better take the current patch in this file for disentangling. In this patch, I have the overloaded functions which are not present in the patch in https://bugs.kde.org/show_bug.cgi?id=374191 .
Comment 26 Mario Frank 2017-01-06 10:43:33 UTC
(In reply to Wolfgang Scheffner from comment #23)
> (In reply to Mario Frank from comment #20)
> > (In reply to Wolfgang Scheffner from comment #19)
> > > (In reply to Mario Frank from comment #18)
> > > 
> > > I'd volunteer to test but I'm working with the AppImage version right now. I
> > > guess there is no easy way to apply the patch to that?
> > 
> > Hey Wolfgang,
> > 
> > No, there is no easy way to do that.
> > I looked at the AppImage build information. I would have to create an
> > AppImage and for this, I would have to create a CentOS VM. This would take
> > some time.
> > 
> > Can you compile digikam on your system or do you just use binary versions?
> 
> I can compile digiKam on my system but I don't know how to include your
> patch. If you can give me instructions for that I can try this
> afternoon/evening.

Hi Wolfgang,

to apply the patch, you have to go into the core directory. There you first fetch the current state of the master branch with git pull.
Then you test the patch with
git apply --check PatchFilePath
If there are no warnings or errors, you type
git apply --apply PatchFilePath

Then you can go in the build folder and compile the system with make install.

I use an alias to start digikam wich I added to my bashrc:
alias digikam-dev="KDESYCOCA=/home/eladrion/local/opt/digikam/var/tmp/kde-eladrion/ksycoca5 XDG_DATA_DIRS=/home/eladrion/local/opt/digikam/share:/usr/share:/usr/share:/usr/local/share QT_PLUGIN_PATH=/home/eladrion/local/opt/digikam/lib64/plugins:/home/eladrion/local/opt/digikam/lib/plugins: /home/eladrion/local/opt/digikam/bin/digikam"
alias digikam-dev-valgrind="KDESYCOCA=/home/eladrion/local/opt/digikam/var/tmp/kde-eladrion/ksycoca5 XDG_DATA_DIRS=/home/eladrion/local/opt/digikam/share:/usr/share:/usr/share:/usr/local/share QT_PLUGIN_PATH=/home/eladrion/local/opt/digikam/lib64/plugins:/home/eladrion/local/opt/digikam/lib/plugins: valgrind /home/eladrion/local/opt/digikam/bin/digikam"

You will have to adopt the paths for your environment. But these aliases make it easy to test. The second alias runs digikam under valgrind.

HTH
Comment 27 Simon 2017-01-06 10:46:42 UTC
On 06/01/17 11:37, Mario Frank wrote:
> https://bugs.kde.org/show_bug.cgi?id=320666
>
> --- Comment #25 from Mario Frank <mario.frank@uni-potsdam.de> ---
> (In reply to Simon from comment #24)
>> I am currently disentangling the changes regarding
>> https://bugs.kde.org/show_bug.cgi?id=374191 and this bug, because there
>> is something I would like to suggest on the patch for the former. I will
>> post the separated pasts as soon as I have them - I hope that is ok.
> Simon, better take the current patch in this file for disentangling. In this
> patch, I have the overloaded functions which are not present in the patch in
> https://bugs.kde.org/show_bug.cgi?id=374191 .
That is what I am doing.
Comment 28 Simon 2017-01-06 13:10:31 UTC
Created attachment 103229 [details]
Patch for introducing the similarity value as column in table view (part of 103203)

I attached Mario's patch without the sections that relate to https://bugs.kde.org/show_bug.cgi?id=374191. I checked that it still compiles, but I have not tested the changes.
Comment 29 Simon 2017-01-06 14:20:34 UTC
I played around with the patch. However I did not test/use this feature
previously, so I don't know whether the following was introduced with
this patch or already there before:

In "Duplicates" the similarity in table view seem reasonable mostly. But
when there are identical pictures in a group of similar images, there
sometimes is one picture with 100, the other with 101, sometimes just
one picture with 101 and sometimes all values are <100. I don't see any
pattern when this happens. What does the similarity mean anyway in a
group? Average similarity is intuitively understandable, individual
similarity is unclear to me.

In "Images" I get a binary: It is either 100 or 1.
Comment 30 Mario Frank 2017-01-06 14:48:16 UTC
(In reply to Simon from comment #29)
> I played around with the patch. However I did not test/use this feature
> previously, so I don't know whether the following was introduced with
> this patch or already there before:
> 
> In "Duplicates" the similarity in table view seem reasonable mostly. But
> when there are identical pictures in a group of similar images, there
> sometimes is one picture with 100, the other with 101, sometimes just
> one picture with 101 and sometimes all values are <100. I don't see any
> pattern when this happens. What does the similarity mean anyway in a
> group? Average similarity is intuitively understandable, individual
> similarity is unclear to me.
> 
> In "Images" I get a binary: It is either 100 or 1.

This is odd. First of all:
In "Duplicates", the original picture should be on top when it is sorted by similarity. The original image is the name of the duplicates album. Thus, it must have the highest value. That's why the similarity of one picture is always 101. Since a similarity range is given, it is possible, that one picture has similarity 101 and all others below 100. That's the expected behaviour. What cannot happen is that some duplicates album has only one entry. The HaarIface returns an empty list of duplicates if only the original image is present. Also, it should not be possible that some duplicates album only has images with similarity less than 100 since the original image is always contained in the album.

While the average similarity in the duplicates album list represents exactly this - the average of all similar pictures (without the original), the similarity in the image view/table view is the similarity of the specific image to the original one.

Can you attach some screenshots showing the binary "Images" and the case where all images have a similarity below 100?
Comment 31 Simon 2017-01-06 15:09:41 UTC
What I described in Duplicates was wrong. The main image was a grouped image, so it was hidden (http://i.imgur.com/JKOkJx7.png), but it was actually there (http://i.imgur.com/2d3OFpM.png). If this is an issue at all, it is definitely a separate one.
I think using a value of 101 is confusing. If one does not know what it signifies it looks like a bug. I would just assign a similarity of 100. That way the main image is still at least in the top group, and when the one potentially above has a similarity of 100, so is anyway identical.

The "binary" problem in the image view looks like this:
http://i.imgur.com/r1dhZLq.png
It really is binary, if I use a range below 50% all images have similarity 0 and a similarity 100 for range above 50%.
Comment 32 Mario Frank 2017-01-06 15:20:37 UTC
(In reply to Simon from comment #31)
> What I described in Duplicates was wrong. The main image was a grouped
> image, so it was hidden (http://i.imgur.com/JKOkJx7.png), but it was
> actually there (http://i.imgur.com/2d3OFpM.png). If this is an issue at all,
> it is definitely a separate one.
> I think using a value of 101 is confusing. If one does not know what it
> signifies it looks like a bug. I would just assign a similarity of 100. That
> way the main image is still at least in the top group, and when the one
> potentially above has a similarity of 100, so is anyway identical.
> 
> The "binary" problem in the image view looks like this:
> http://i.imgur.com/r1dhZLq.png
> It really is binary, if I use a range below 50% all images have similarity 0
> and a similarity 100 for range above 50%.

Hey, the grouped image is something that would be some other issue, I think.
Could be something in the imagelister.
I changed the similarity of the original image to 100.00. I agree that this could be confusing.

The "binary" problem is really strange. I cannot reproduce it. 
Can you run digikam from console? When you search for similar images, the similarities are logged
with the digikam.database: prefix.

e.g.:
digikam.database: 181148 "42.4814%"

Are the values logged also binary?
Comment 33 Simon 2017-01-06 15:30:12 UTC
The percentages logged to stdout are meaningful (i.e. not binary).
Comment 34 Mario Frank 2017-01-06 15:45:44 UTC
(In reply to Simon from comment #33)
> The percentages logged to stdout are meaningful (i.e. not binary).

Okay, then it is no HAAR search problem and must be a problem in the listFromHaarSearch function. 

Are the images with similarity 100 identical to the image searched for?

Do you use a SQLite DB or MySQL?

Can you test the following:
in libs/database/item/imagelister.cpp, function listFromHaarSearch,
after 

record.currentSimilarity = (*it).toDouble();

insert:

qCDebug(DIGIKAM_DATABASE_LOG) << "Similarity of image " << record.imageID << " in imageLister: " << record.currentSimilarity;

When you compile this and search again, is the current similarity 0.0?
Comment 35 Simon 2017-01-06 16:07:01 UTC
I use internal mysql and the images are not (exclusively) identical ones
(range 30-100% gives all kinds of "similar" pictures as is to be expected).

I added your snippet and it produces 0/1 outputs like this:
    digikam.database: Similarity of image  77839 in imageLister:  0
    digikam.database: Similarity of image  77845 in imageLister:  1
Comment 36 Mario Frank 2017-01-06 16:24:54 UTC
(In reply to Simon from comment #35)
> I use internal mysql and the images are not (exclusively) identical ones
> (range 30-100% gives all kinds of "similar" pictures as is to be expected).
> 
> I added your snippet and it produces 0/1 outputs like this:
>     digikam.database: Similarity of image  77839 in imageLister:  0
>     digikam.database: Similarity of image  77845 in imageLister:  1

Okay, I have found the evil thing. On SQLite, my solution works. I return the similarity as constant in the select statement. But in MySQL the Query I created is not stable. It seems that the double was cast to integer in the database. Thus, the results are wrong. 

Can you replace the imagelister.cpp with my new one? The new solution still works under SQLite.
Comment 37 Mario Frank 2017-01-06 16:26:01 UTC
Created attachment 103234 [details]
imagelister.cpp

Here is the new file
Comment 38 Simon 2017-01-06 16:37:36 UTC
Works perfectly fine now!
Comment 39 Mario Frank 2017-01-06 16:40:45 UTC
(In reply to Simon from comment #38)
> Works perfectly fine now!

Great,

that was good teamwork. I will upload our current patch which will include your changes. Just for archiving. If nobody gives his veto, I would merge the patch with master branch as our joint work today.

Can https://bugs.kde.org/show_bug.cgi?id=374191 be closed? Or is there something left?
Comment 40 Mario Frank 2017-01-06 16:42:36 UTC
Created attachment 103235 [details]
Patch working both for SQLite and MySQL
Comment 41 Simon 2017-01-06 16:48:30 UTC
Indeed :D
Yes, that bug can be closed with committing the patch there.

One minor thing: git apply complains that your new patch introduces
trailing whitespaces. I can't seem to find it though after applying, so
maybe git removed it automatically.
Comment 42 Mario Frank 2017-01-06 16:50:46 UTC
(In reply to Simon from comment #41)
> Indeed :D
> Yes, that bug can be closed with committing the patch there.
> 
> One minor thing: git apply complains that your new patch introduces
> trailing whitespaces. I can't seem to find it though after applying, so
> maybe git removed it automatically.

Okay. Yes, git cleans that up. But I will clean this up myself before committing.
But first I'll wait for some of the other devs to acknowledge the commit.
Comment 43 Simon 2017-01-06 17:00:55 UTC
In case you are bored while waiting :P
Would you care to look at https://bugs.kde.org/show_bug.cgi?id=372159
and comment on https://bugs.kde.org/show_bug.cgi?id=374470 ?
Comment 44 Mario Frank 2017-01-06 17:12:41 UTC
(In reply to Simon from comment #43)
> In case you are bored while waiting :P
> Would you care to look at https://bugs.kde.org/show_bug.cgi?id=372159
> and comment on https://bugs.kde.org/show_bug.cgi?id=374470 ?

I'm currently on my way home but I will try to take a look today.
Comment 45 caulier.gilles 2017-01-06 17:40:34 UTC
No objection from me to patch git/master before 5.4.0 release planed Sunday evening. We have only one string added which is already in catalog.

Note : when you use pointer, make it const if possible (read only). It's more safe.

Gilles Caulier
Comment 46 Mario Frank 2017-01-06 18:32:35 UTC
(In reply to caulier.gilles from comment #45)
> No objection from me to patch git/master before 5.4.0 release planed Sunday
> evening. We have only one string added which is already in catalog.
> 
> Note : when you use pointer, make it const if possible (read only). It's
> more safe.
> 
> Gilles Caulier

Okay, i made the pointers in slotSetSelectedAlbums const. I think you ment them.
One question. In Duplicates search, the duplicates albums contain the original image. I am not sure if this should be this way since it is no duplicate semantically. Should I exclude the original images from the duplicates albums?
Comment 47 Mario Frank 2017-01-06 18:35:33 UTC
(In reply to Mario Frank from comment #46)
> (In reply to caulier.gilles from comment #45)
> > No objection from me to patch git/master before 5.4.0 release planed Sunday
> > evening. We have only one string added which is already in catalog.
> > 
> > Note : when you use pointer, make it const if possible (read only). It's
> > more safe.
> > 
> > Gilles Caulier
> 
> Okay, i made the pointers in slotSetSelectedAlbums const. I think you ment
> them.
> One question. In Duplicates search, the duplicates albums contain the
> original image. I am not sure if this should be this way since it is no
> duplicate semantically. Should I exclude the original images from the
> duplicates albums?

Ignore the latter for now. We can discuss this after the release.
Comment 48 Mario Frank 2017-01-06 18:38:28 UTC
Git commit 429fa5fd8e7f53b74c82eb19dffb2e6cf4b4325c by Mario Frank.
Committed on 06/01/2017 at 18:36.
Pushed by mfrank into branch 'master'.

This patch is joint work with Simon. The patch introduces the similarity of images to a specific image
as column in table view. Also, it fixes the sorting by similarity in sketch search.
Moreover, the dysfunctional context menu item "Find duplicates" in people sidebar now leads to the selection
of the people tags as regular tabs in duplicates search. Finally, selecting multiple regular/people tags for
duplicates search is possible. Not to mention some refactoring to make function names more fitting.
Related: bug 374191, bug 302923
FIXED-IN: 5.4.0

M  +1    -1    NEWS
M  +21   -30   app/views/digikamview.cpp
M  +5    -4    app/views/digikamview.h
M  +17   -15   app/views/leftsidebarwidgets.cpp
M  +6    -8    app/views/leftsidebarwidgets.h
M  +34   -2    app/views/tableview/tableview_column_item.cpp
M  +2    -1    app/views/tableview/tableview_column_item.h
M  +1    -1    libs/album/albumselectiontreeview.cpp
M  +1    -1    libs/album/albumselectiontreeview.h
M  +5    -0    libs/album/albumtreeview.cpp
M  +2    -1    libs/album/albumtreeview.h
M  +10   -1    libs/database/dbjobs/dbjob.cpp
M  +45   -30   libs/database/haar/haariface.cpp
M  +5    -5    libs/database/haar/haariface.h
M  +14   -0    libs/database/item/imageinfo.cpp
M  +5    -0    libs/database/item/imageinfo.h
M  +2    -0    libs/database/item/imageinfodata.h
M  +106  -6    libs/database/item/imagelister.cpp
M  +8    -1    libs/database/item/imagelister.h
M  +10   -6    libs/database/item/imagelisterrecord.h
M  +0    -6    libs/models/imagefiltermodel.cpp
M  +0    -1    libs/models/imagefiltermodel.h
M  +5    -10   libs/models/imagesortsettings.cpp
M  +0    -2    libs/models/imagesortsettings.h
M  +0    -2    libs/settings/applicationsettings.cpp
M  +0    -3    libs/settings/applicationsettings.h
M  +0    -10   libs/settings/applicationsettings_miscs.cpp
M  +0    -4    libs/settings/applicationsettings_p.cpp
M  +0    -2    libs/settings/applicationsettings_p.h
M  +3    -1    libs/tags/tagfolderview.cpp
M  +1    -1    libs/tags/tagfolderview.h
M  +3    -3    libs/tags/tagsmanager/tagsmanager.cpp
M  +16   -16   utilities/fuzzysearch/findduplicatesview.cpp
M  +5    -2    utilities/fuzzysearch/findduplicatesview.h
M  +22   -20   utilities/fuzzysearch/fuzzysearchview.cpp
M  +5    -4    utilities/fuzzysearch/fuzzysearchview.h

https://commits.kde.org/digikam/429fa5fd8e7f53b74c82eb19dffb2e6cf4b4325c
Comment 49 Niels 2017-01-25 19:26:56 UTC
Great stuff guys! Thank you!

I can't DnD from a browser, but using a staging album as suggested is doable.
Comment 50 Mario Frank 2017-01-25 19:43:32 UTC
(In reply to Niels from comment #49)
> Great stuff guys! Thank you!
> 
> I can't DnD from a browser, but using a staging album as suggested is doable.

Hey Niels,

great to hear you like it.
I restricted the DnD on purpose.
Allowing DnD for remote files may be a security risk. Thus, only local files are usable.

Best,
Mario