Bug 375700

Summary: Digikam freezes for a long time (I/O wait) if similarity range is set too low
Product: [Applications] digikam Reporter: Jens <jens-bugs.kde.org>
Component: Searches-SimilarityAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: crash CC: mario.frank
Priority: NOR    
Version: 5.5.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 5.5.0
Sentry Crash Report:

Description Jens 2017-01-29 13:02:44 UTC
I have roughly 160GB of photos on my SSD disk on an Intel Haswell i5-4570 with 8G RAM (so enough horsepower). However, when I select an image and search for similar images using similarity search, and then set the similarity starting at 20% (up to 100%), the application freezes for almost a minute.

While this might be technically correct behaviour, the user experience here is really bad. Please add a progress bar (if possible) and a "Cancel search" button. Maybe it is possible to perform this search in the background altogether (so you can return to the results later on)?
Also, a warning maybe for similarities below 30% that they might take a long time to match would be helpful.

Thank you!
Comment 1 Jens 2017-01-29 13:07:24 UTC
This is even worse - if I kill Digikam because the search takes too long, it will (apparently) restart the search upon next start, giving the impression that the application does not even start any more.

Maybe also a maximum search time (timeout) for all fuzzy search operations, like 20 seconds, makes sense? At least to give the user some feedback.
Comment 2 Jens 2017-01-29 13:44:32 UTC
Update: This is one example SQL query that freezes upon restart.
But if I remove this query from digikam4.db / Searches table, then another one freezes, so I suspect a deeper issue.

In any case, Digikam should not freeze or refuse to start up because of a saved search. IMHO. :)

digikam.database: Search query:
 "SELECT DISTINCT Images.id, Images.name, Images.album,        Albums.albumRoot,        ImageInformation.rating, Images.category,        ImageInformation.format, ImageInformation.creationDate,        Images.modificationDate, Images.fileSize,        ImageInformation.width, ImageInformation.height,        ImagePositions.latitudeNumber, ImagePositions.longitudeNumber  FROM Images        LEFT JOIN ImageInformation ON Images.id=ImageInformation.imageid        LEFT  JOIN ImageMetadata    ON Images.id=ImageMetadata.imageid        LEFT  JOIN VideoMetadata    ON Images.id=VideoMetadata.imageid        LEFT  JOIN ImagePositions   ON Images.id=ImagePositions.imageid        INNER JOIN Albums           ON Albums.id=Images.album WHERE Images.status=1 AND (  ( (  (Albums.relativePath LIKE ?) OR (Images.name LIKE ?) OR (Images.id IN    (SELECT imageid FROM ImageTags     WHERE tagid IN    (SELECT id FROM Tags WHERE name LIKE ?))) OR (Albums.caption LIKE ?) OR (Albums.collection LIKE ?) OR (Images.id IN  (SELECT imageid FROM ImageComments   WHERE type=? AND comment LIKE ?)) OR (Images.id IN  (SELECT imageid FROM ImageComments   WHERE type=? AND comment LIKE ?))  ) )  );" 
 (QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(int, 1), QVariant(QString, "%teppich%"), QVariant(int, 3), QVariant(QString, "%teppich%"))
digikam.geoiface: "ROADMAP"
digikam.geoiface: "setting backend marble"
digikam.geoiface: "ROADMAP"
digikam.geoiface: ----
digikam.general: Cancel Main Thread
digikam.geoiface: ----


VACUUM resulted in

Die Datenbank wurde unter Verwendung des VACUUM-Statements komprimiert.
Vor dem Komprimieren:
	Seitenanzahl = 66474
	Datenbankgröße = 68069376 Bytes
Nachdem komprimieren:n	Seitenanzahl = 970
	Datenbankgröße = 993280 bytes
Comment 3 Mario Frank 2017-02-03 09:58:22 UTC
Hi Jens,

This is a known problem. If the similarity range is too big or the minimum similarity too low, the process takes much time. And the application becomes unresponsive since the image signatures are located in the core DB currently.
Thus, everything that wants to access the core DB is slowed down extremely.

We will have to migrate everything concerning fuzzy search and duplicates search into a dedicated database. This would give us the opportunity to make searches in background easier.

The search result you see is just a virtual album which is loaded every time you visit it. Thus, we cannot do it in background currently. 

Limiting the search by time is formally no good approach, I think. If we limit the time, we can only scan a limited amount of images. The result would be that images that would be scanned later and have a higher similarity would not be shown. Just because of timing constraints.

I would now re-introduce a lower similarity bound for fuzzy and similarity searches. This bound was 40 %. And frankly said, I would not use fuzzy search with similarities lower than 40 % myself. The probability that images have such a low similarity is quite high.
Comment 4 Mario Frank 2017-02-03 13:45:24 UTC
Git commit 574a76623105b08dc4f1f62ccb8cc6bd822bae0d by Mario Frank.
Committed on 03/02/2017 at 13:44.
Pushed by mfrank into branch 'master'.

Introduced a configurable lower bound for the minimum similarity in fuzzy and duplicates search.
This reduces the probability of unresponsiveness due too many images being scanned for similarity.
The bound can be configured in setup->misc and takes effect immediately in fuzzy search view and
duplicates view. Thus, no restart of digiKam is necessary.
FIXED-IN: 5.5.0

M  +3    -1    NEWS
M  +2    -0    libs/settings/applicationsettings.cpp
M  +3    -0    libs/settings/applicationsettings.h
M  +10   -0    libs/settings/applicationsettings_miscs.cpp
M  +3    -0    libs/settings/applicationsettings_p.cpp
M  +2    -0    libs/settings/applicationsettings_p.h
M  +21   -1    utilities/fuzzysearch/findduplicatesview.cpp
M  +1    -0    utilities/fuzzysearch/findduplicatesview.h
M  +22   -2    utilities/fuzzysearch/fuzzysearchview.cpp
M  +2    -0    utilities/fuzzysearch/fuzzysearchview.h
M  +11   -1    utilities/maintenance/maintenancedlg.cpp
M  +24   -1    utilities/setup/setupmisc.cpp

https://commits.kde.org/digikam/574a76623105b08dc4f1f62ccb8cc6bd822bae0d