I have roughly 160GB of photos on my SSD disk on an Intel Haswell i5-4570 with 8G RAM (so enough horsepower). However, when I select an image and search for similar images using similarity search, and then set the similarity starting at 20% (up to 100%), the application freezes for almost a minute. While this might be technically correct behaviour, the user experience here is really bad. Please add a progress bar (if possible) and a "Cancel search" button. Maybe it is possible to perform this search in the background altogether (so you can return to the results later on)? Also, a warning maybe for similarities below 30% that they might take a long time to match would be helpful. Thank you!
This is even worse - if I kill Digikam because the search takes too long, it will (apparently) restart the search upon next start, giving the impression that the application does not even start any more. Maybe also a maximum search time (timeout) for all fuzzy search operations, like 20 seconds, makes sense? At least to give the user some feedback.
Update: This is one example SQL query that freezes upon restart. But if I remove this query from digikam4.db / Searches table, then another one freezes, so I suspect a deeper issue. In any case, Digikam should not freeze or refuse to start up because of a saved search. IMHO. :) digikam.database: Search query: "SELECT DISTINCT Images.id, Images.name, Images.album, Albums.albumRoot, ImageInformation.rating, Images.category, ImageInformation.format, ImageInformation.creationDate, Images.modificationDate, Images.fileSize, ImageInformation.width, ImageInformation.height, ImagePositions.latitudeNumber, ImagePositions.longitudeNumber FROM Images LEFT JOIN ImageInformation ON Images.id=ImageInformation.imageid LEFT JOIN ImageMetadata ON Images.id=ImageMetadata.imageid LEFT JOIN VideoMetadata ON Images.id=VideoMetadata.imageid LEFT JOIN ImagePositions ON Images.id=ImagePositions.imageid INNER JOIN Albums ON Albums.id=Images.album WHERE Images.status=1 AND ( ( ( (Albums.relativePath LIKE ?) OR (Images.name LIKE ?) OR (Images.id IN (SELECT imageid FROM ImageTags WHERE tagid IN (SELECT id FROM Tags WHERE name LIKE ?))) OR (Albums.caption LIKE ?) OR (Albums.collection LIKE ?) OR (Images.id IN (SELECT imageid FROM ImageComments WHERE type=? AND comment LIKE ?)) OR (Images.id IN (SELECT imageid FROM ImageComments WHERE type=? AND comment LIKE ?)) ) ) );" (QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(QString, "%teppich%"), QVariant(int, 1), QVariant(QString, "%teppich%"), QVariant(int, 3), QVariant(QString, "%teppich%")) digikam.geoiface: "ROADMAP" digikam.geoiface: "setting backend marble" digikam.geoiface: "ROADMAP" digikam.geoiface: ---- digikam.general: Cancel Main Thread digikam.geoiface: ---- VACUUM resulted in Die Datenbank wurde unter Verwendung des VACUUM-Statements komprimiert. Vor dem Komprimieren: Seitenanzahl = 66474 Datenbankgröße = 68069376 Bytes Nachdem komprimieren:n Seitenanzahl = 970 Datenbankgröße = 993280 bytes
Hi Jens, This is a known problem. If the similarity range is too big or the minimum similarity too low, the process takes much time. And the application becomes unresponsive since the image signatures are located in the core DB currently. Thus, everything that wants to access the core DB is slowed down extremely. We will have to migrate everything concerning fuzzy search and duplicates search into a dedicated database. This would give us the opportunity to make searches in background easier. The search result you see is just a virtual album which is loaded every time you visit it. Thus, we cannot do it in background currently. Limiting the search by time is formally no good approach, I think. If we limit the time, we can only scan a limited amount of images. The result would be that images that would be scanned later and have a higher similarity would not be shown. Just because of timing constraints. I would now re-introduce a lower similarity bound for fuzzy and similarity searches. This bound was 40 %. And frankly said, I would not use fuzzy search with similarities lower than 40 % myself. The probability that images have such a low similarity is quite high.
Git commit 574a76623105b08dc4f1f62ccb8cc6bd822bae0d by Mario Frank. Committed on 03/02/2017 at 13:44. Pushed by mfrank into branch 'master'. Introduced a configurable lower bound for the minimum similarity in fuzzy and duplicates search. This reduces the probability of unresponsiveness due too many images being scanned for similarity. The bound can be configured in setup->misc and takes effect immediately in fuzzy search view and duplicates view. Thus, no restart of digiKam is necessary. FIXED-IN: 5.5.0 M +3 -1 NEWS M +2 -0 libs/settings/applicationsettings.cpp M +3 -0 libs/settings/applicationsettings.h M +10 -0 libs/settings/applicationsettings_miscs.cpp M +3 -0 libs/settings/applicationsettings_p.cpp M +2 -0 libs/settings/applicationsettings_p.h M +21 -1 utilities/fuzzysearch/findduplicatesview.cpp M +1 -0 utilities/fuzzysearch/findduplicatesview.h M +22 -2 utilities/fuzzysearch/fuzzysearchview.cpp M +2 -0 utilities/fuzzysearch/fuzzysearchview.h M +11 -1 utilities/maintenance/maintenancedlg.cpp M +24 -1 utilities/setup/setupmisc.cpp https://commits.kde.org/digikam/574a76623105b08dc4f1f62ccb8cc6bd822bae0d