When having many pictures, including variants of one picture with different quality, e.g. due to resizing, conversion and Collage creation, the lower-quality pictures may be found only with low similarity threshold (e.g. 45 %). But the result set will contain all pictures with a similarity between 45 % and 100 %. This can make the search for low-quality variants frustrating. Having the possibility to specify the maximum similarity may solve the problem. Reproducible: Always Steps to Reproduce: 1.Have many series pictures you want to keep and some lower-quality variants you want to get rid off. 2. Start a duplicate search with, let's say 40 % Actual Results: You will get all pictures with a similarity above 40 % Expected Results: It is designed to do that. But having an option to specify a maximum similarity could be more convenient. I implemented and tested that. Also, I can provide a patch file against the master branch. Here is the local commit message describing the implementation: "Extended the findduplicatesview and fuzzysearchview with an additional QSpinBox which denotes the maximum similarity. The new QSpinBox has a minimum value that is the current value of the minimal similarity threshold. When the minimum threshold is altered, the range of the new QSpinBox is updated. If the minimum threshold is increased beyond the current value of the new QSpinBox, the value of the new QSpinBox is increased automatically. In the fuzzysearchview, altering the maximum similarity also triggers the reuild of the similar images album. The extension can be highly valuable if you knowingly want to ignore almost identical images but want to find images that have a similarity of, let's say 50-60%, due to resizing, cropping or something similar, without bloating your image pane."
Created attachment 101176 [details] The patch for introducing a similarity interval
Mario, The patch is very interesting and well implemented. I plan to introduce your code after 5.3.0. Q : currently, the icon view of fuzzy searches result is not filter by average order. All items found are mixed. It can be a good idea to sort item in this view, this will increase the usability. Your viewpoint ? Best Gilles Caulier
Hey Gilles, those are good news. I agree with you concerning the improved usability by ordering the, as I understand, list of results in the left pane where the reference image and count of similar images is shown. But introducing an order here means changing the signature of the functions in haariface. Since QMap automatically has a sorting on the keys, we could use this to introduce an order to the result set. One quite easy way would be to wrap the QMap<qlonglong,QList<qlonglong>> as value of a avg-similarity-map. This would surely increase the memory consumption during search. But the automatic ordering by the similarity would circumvent a signifficant increase of runtime. After a small glimpse at the source code with grep, I found no possible conflicts with other files concerning the definition of the result set. Changing the return value types in haariface should be most likely safe. Should I propose another patch for this issue?
yes one another patch to one another report please. Thanks in advance Gilles
Git commit afe577f0b297a343ab412ce95c1f75303edfb18b by Gilles Caulier. Committed on 10/11/2016 at 04:48. Pushed by cgilles into branch 'master'. Apply big patch #101176 from Mario Frank This one extended the findduplicatesview and fuzzysearchview with an additional QSpinBox which denotes the maximum similarity. The new QSpinBox has a minimum value that is the current value of the minimal similarity threshold. When the minimum threshold is altered, the range of the new QSpinBox is updated. If the minimum threshold is increased beyond the current value of the new QSpinBox, the value of the new QSpinBox is increased automatically. In the fuzzysearchview, altering the maximum similarity also triggers the reuild of the similar images album. The extension can be highly valuable if you knowingly want to ignore almost identical images but want to find images that have a similarity of, let's say 50-60%, due to resizing, cropping or something similar, without bloating your image pane. FIXED-IN: 5.4.0 CCMAIL: frank@uni-potsdam.de M +2 -0 app/utils/searchmodificationhelper.cpp M +1 -0 app/utils/searchmodificationhelper.h M +4 -3 libs/database/dbjobs/dbjob.cpp M +16 -5 libs/database/dbjobs/dbjobinfo.cpp M +7 -3 libs/database/dbjobs/dbjobinfo.h M +27 -16 libs/database/haar/haariface.cpp M +9 -8 libs/database/haar/haariface.h M +9 -2 libs/database/item/imagelister.cpp M +53 -25 utilities/fuzzysearch/findduplicatesview.cpp M +1 -0 utilities/fuzzysearch/findduplicatesview.h M +58 -11 utilities/fuzzysearch/fuzzysearchview.cpp M +2 -1 utilities/fuzzysearch/fuzzysearchview.h M +16 -10 utilities/maintenance/duplicatesfinder.cpp M +2 -2 utilities/maintenance/duplicatesfinder.h http://commits.kde.org/digikam/afe577f0b297a343ab412ce95c1f75303edfb18b
Mario, Your patch is now applied to current implementation and will be avaialble for next 5.4.0 release. Next step for me is to review your new patch from bug #372217. Note that your next patch must close certainly bug #302923 (please confirm). In parallel, can you check what can be do to improve again duplicate searches tool with: - bug #261417 : the searches album counter is not updated. - bug #353331 : typically this one can be certainly closed as we can limit search to a specific physical or virtual album. Please just review to confirm. - bug #207188 : as i remember, the algorithm to process fingerprints over image take a care about colors contents (else, this will have no sense...). So i"m not sure if this file is valid... - bug #274360 : i cannot figure why some king of image type are ignored. All image format supported by digiKam will be processed while fingerprints computation and searches. Again, thanks for your contributions. I appreciate the quality of your patches, which a a pleasure to review.
>Next step for me is to review your new patch from bug #372217. Note that your >next patch must close certainly bug #302923 (please confirm). I respond myself: your patch from bug #372217 cannot solve bug #302923, because patch is dedicated to sort search albums from left sidebar, not the icon view on the center. I will appreciate a patch aver icon-view model/view to be able to sort by similarly level. Thanks in advance Gilles Caulier
Hey Gilles, Many thanks for the judgement about the quality of my patches. I will try to fix what I can. Some of the "bugs" do not seem to be hard to fix. Some other could be more complex.
By the way: the CCMAIL is incorrect. The correct one is mario.frank@uni-potsdam.de. If the dot should be a problem, just use mafrank@uni-potsdam.de.
Before I update the doc accordingly: shouldn't the labeling be changed now to "Similarity range" or at least "Thresholds"?
I agree, Wolfgang. Similarity range is a better description here. Moreover, I just realised that it is not possible to set a range in the maintainance dialog. I will open a new file for both parts and submit a patch.