Bug 493294 - Extremely slow duplicate search. It takes literally days.
Summary: Extremely slow duplicate search. It takes literally days.
Status: REPORTED
Alias: None
Product: digikam
Classification: Applications
Component: Maintenance-Similarities (show other bugs)
Version: 8.5.0
Platform: Microsoft Windows Microsoft Windows
: NOR crash
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-09-17 21:09 UTC by Phrayzur
Modified: 2025-03-15 15:32 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Shared Libraries and Components Information report (25.26 KB, text/plain)
2024-09-17 21:09 UTC, Phrayzur
Details
attachment-102224-0.html (1.13 KB, text/html)
2024-12-02 07:49 UTC, Phrayzur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Phrayzur 2024-09-17 21:09:53 UTC
Created attachment 173797 [details]
Shared Libraries and Components Information report

SUMMARY
I’m new to digiKam. I’m using version 8.5 on windows 11. I’m using the Mysql Internal database.

My machine is a Ryzen 9 7900X3D with 64 gb ram, Nvidia 4090 and I’m using Western Digital Black SN850X 2 and 4 tb NVMe drives. The image collection is on it’s own physical drive.

I have 286,583 images that I need to organize. Some images have many, many copies. I’m attempting to search for duplicates between 90 and 100 similarity.

The problem I’m having is that it took 20hrs for the duplicate search to get to 4% complete. A duplicate search at 100 to 100 similarity only reached 1% after an hour.

In the Windows Task Manager, the memory used is under 2 gb, the cpu is under 10% and the disk is under 0.5% while I’m searching for duplicates.

I’ve tried a different computer, different drives, reinstalling, changing the database type, deleting the database and starting fresh (a few times) and updating the fingerprints repeatedly.

The odd thing is, when I right click on an image and click “Find Similar” with the search set from 50 to 100, it only takes a few seconds.

Can you give me some insight about what I'm doing wrong?



STEPS TO REPRODUCE
1.  Open maintenance
2.  Find duplicates at 90 to 100
3.  Start search

OBSERVED RESULT
Extremely slow results. 20 hours to reach 4%

EXPECTED RESULT
Faster results

SOFTWARE/OS VERSIONS
Windows: 11
KDE Frameworks Version:  6.5.0
Qt Version: Version 6.7.2 (built against 6.7.2)

ADDITIONAL INFORMATION
I posted this in the discuss group but I haven't received any replies and only 1 other person has looked at it.

I'm not sure what else to attach, but I've attached the Shared Libraries and Components Information report.
Comment 1 Phrayzur 2024-09-18 18:49:25 UTC
Duplicate search for 9.5 hrs at 100 to 100 similarity has reached 26%

*UPDATE:
The entire program crashed at some point and I’ve had to restart the duplicate search
Comment 2 Phrayzur 2024-09-20 14:40:20 UTC
After restarting the duplicate search on  2024-09-18 18:49:25 it's been running continuously and it's only at 11%. What can I do about this? 

I notice that the importance of this report was downgraded to normal. If I do anything with the software while it's searching for duplicates, it crashes. I can't do anything until this duplicate search completes. I would classify this higher than normal importance. But I might be interpreting your bug scale incorrectly.
Comment 3 Maik Qualmann 2024-09-20 14:58:00 UTC
Try to create at least a DebugView log for the crash, or better yet a backtrace, as described here:

https://www.digikam.org/contribute/

I have already tested the function, it also seems slower to me than it did some time ago, but I don't have a final answer yet.
Remember that the number of images you are trying to compare is quite large, each of the 280,000 images must be compared against each other.

Maik
Comment 4 Phrayzur 2024-09-20 18:02:30 UTC
(In reply to Maik Qualmann from comment #3)
> Try to create at least a DebugView log for the crash, or better yet a
> backtrace, as described here:
> 
> https://www.digikam.org/contribute/
> 
> I have already tested the function, it also seems slower to me than it did
> some time ago, but I don't have a final answer yet.
> Remember that the number of images you are trying to compare is quite large,
> each of the 280,000 images must be compared against each other.
> 
> Maik

I can do both, but sometimes it doesn't crash. Is there any other info I can provide or record to help? 

I'm assuming I should restart DigiKam and the duplicate search. If so, would you prefer I start it from maintenance or the similarity tab?

I've noticed some huge variations in the speed of duplicate detection. Currently, that's the only thing running to do with the mariadbd.exe process. It's only using 6.3% of the CPU power and only 175 MB of memory but it's accessing the drive at 109.8 MB/s. Wheras Digikam.exe is using 0% CPU, 1834.1 MB of memory and 0 MB/s for the drive.
Comment 5 Phrayzur 2024-09-20 21:16:34 UTC
I haven't had any success with using debugview. I can get data from the output window in Visual Studio Community. There isn't any output in the call stack window though. Are there options or addons that I need? Will the data from the output window be enough?
Comment 6 Phrayzur 2024-09-21 15:03:51 UTC
Here's another odd thing. I've had the duplicate search running for about 11 hrs and 40 mins and it's at 12% complete. Where last time it ran for over 2 days and 4 hrs and it was only at 11%. 

I'm doing this search at 100% to 100% as well. And I've got Visual Studio Community running. The Debug Output Window is the only thing showing me data though
Comment 7 caulier.gilles 2024-09-21 16:08:35 UTC
Hi,


There is a problem in your computer, i'm sure. I use everywhere in my office (Windows 10) and the DebugView program works like a charm.

The prerformance described as abnormal and not homogeneous between stage. The crashes are abnormal too. I suspect something like an antivirus working in the background locking files access in the database, or introducing time latency in the database to register fingerprints data.

Here under Linux with my huge collection of more than 500.000 items it very fast as few hours. No more. And no crash is reproducible.

Gilles Caulier
Comment 8 Phrayzur 2024-09-21 23:46:22 UTC
(In reply to caulier.gilles from comment #7)
> Hi,
> 
> 
> There is a problem in your computer, i'm sure. I use everywhere in my office
> (Windows 10) and the DebugView program works like a charm.
> 
> The prerformance described as abnormal and not homogeneous between stage.
> The crashes are abnormal too. I suspect something like an antivirus working
> in the background locking files access in the database, or introducing time
> latency in the database to register fingerprints data.
> 
> Here under Linux with my huge collection of more than 500.000 items it very
> fast as few hours. No more. And no crash is reproducible.
> 
> Gilles Caulier

I've tried with the antivirus disabled and then I uninstalled it. I'm still running the stock windows defender antivirus but I have it disabled as well. 

As for DebugView, I've enabled the internal debug logging and restarted digiKam, but there isn't anything being sent to DebugView. I've tried to manually set the new user variable, but the Windows System Information panel doesn't allow any input. It just provides information. 

Could there be options I need to set in DebugView or would the instructions be wrong about where to add a new user variable?
Comment 9 Phrayzur 2024-09-22 00:27:29 UTC
I'm getting the following types of statements in the output window of Visual Studio Community: 

18:25:52:413	digikam.database: Duplicates with id and score:
18:25:52:413	digikam.database: 112887 "100%"
18:25:52:413	digikam.database: 112888 "100%"
18:25:52:413	digikam.database: 235312 "100%"
18:25:53:412	digikam.database: Duplicates with id and score:
18:25:53:412	digikam.database: 119398 "100%"
18:25:53:412	digikam.database: 215159 "100%"
18:25:53:910	digikam.database: Duplicates with id and score:
18:25:53:910	digikam.database: 135859 "100%"
18:25:53:910	digikam.database: 216058 "100%"
18:25:54:656	digikam.dbengine: Failure executing query:
18:25:54:656	 "SELECT value FROM ImageSimilarity WHERE ( imageid1=? OR imageid2=? ) AND algorithm=?;" 
18:25:54:656	Error messages: "QMYSQL: Unable to prepare statement" "Lost connection to MySQL server during query" "2013" 2 
18:25:54:656	Bound values:  QList()
18:25:54:656	digikam.dbengine: Failure executing query:
18:25:54:656	 "SELECT value FROM ImageSimilarity WHERE ( imageid1=10 OR imageid2=12 ) AND algorithm=1;" 
18:25:54:656	Error messages: "QMYSQL: Unable to execute query" "Lost connection to MySQL server during query" "2013" 2 
18:25:54:656	Bound values:  QList(QVariant(qlonglong, 10), QVariant(qlonglong, 12), QVariant(int, 1))
18:25:54:656	digikam.dbengine: Failure executing query:
18:25:54:656	 "SELECT Images.name, Albums.albumRoot, Albums.relativePath, Albums.id FROM Images  INNER JOIN Albums ON Albums.id=Images.album   WHERE Images.id=?;" 
18:25:54:656	Error messages: "QMYSQL: Unable to prepare statement" "Lost connection to MySQL server during query" "2013" 2 
18:25:54:656	Bound values:  QList()
18:25:54:656	digikam.dbengine: Failure executing query:
18:25:54:656	 "SELECT Images.name, Albums.albumRoot, Albums.relativePath, Albums.id FROM Images  INNER JOIN Albums ON Albums.id=Images.album   WHERE Images.id=11;" 
18:25:54:656	Error messages: "QMYSQL: Unable to execute query" "Lost connection to MySQL server during query" "2013" 2 
18:25:54:656	Bound values:  QList(QVariant(qlonglong, 11))
18:25:54:656	digikam.dbengine: Failure executing query:
18:25:54:656	 "SELECT status FROM Images WHERE id=?;" 
18:25:54:656	Error messages: "QMYSQL: Unable to prepare statement" "Lost connection to MySQL server during query" "2013" 2 
18:25:54:656	Bound values:  QList()
18:25:54:656	digikam.dbengine: Failure executing query:
18:25:54:656	 "SELECT status FROM Images WHERE id=11;" 
18:25:54:656	Error messages: "QMYSQL: Unable to execute query" "Lost connection to MySQL server during query" "2013" 2 
18:25:54:656	Bound values:  QList(QVariant(qlonglong, 11))
18:25:55:406	digikam.database: Duplicates with id and score:
18:25:55:406	digikam.database: 148536 "100%"
18:25:55:406	digikam.database: 148537 "100%"
18:25:55:406	digikam.database: 148538 "100%"
18:25:55:406	digikam.database: 148539 "100%"
18:25:55:406	digikam.database: 149691 "100%"
18:25:55:406	digikam.database: 149692 "100%"
18:25:55:406	digikam.database: 149693 "100%"
18:25:55:406	digikam.database: 149694 "100%"
18:25:55:406	digikam.database: 149695 "100%"
18:25:55:406	digikam.database: 149696 "100%"
18:25:55:406	digikam.database: 149697 "100%"
18:25:55:406	digikam.database: 150919 "100%"
18:25:55:406	digikam.database: 150920 "100%"
18:25:55:406	digikam.database: 150921 "100%"
18:25:55:406	digikam.database: 150922 "100%"
18:25:55:406	digikam.database: 150923 "100%"
18:25:55:406	digikam.database: 150924 "100%"
18:25:55:406	digikam.database: 150925 "100%"
18:25:55:406	digikam.database: 150926 "100%"
18:25:55:406	digikam.database: 150927 "100%"
18:25:55:406	digikam.database: 150928 "100%"
18:25:55:406	digikam.database: 150929 "100%"
18:25:55:406	digikam.database: 151391 "100%"
18:25:55:406	digikam.database: 151392 "100%"
18:25:55:406	digikam.database: 151393 "100%"
18:25:55:406	digikam.database: 151394 "100%"
18:25:55:406	digikam.database: 151395 "100%"
18:25:55:406	digikam.database: 151396 "100%"
18:25:55:406	digikam.database: 152273 "100%"
18:25:55:406	digikam.database: 152274 "100%"
18:25:55:406	digikam.database: 154739 "100%"
18:25:55:406	digikam.database: 154740 "100%"
18:25:55:406	digikam.database: 154741 "100%"
18:25:55:406	digikam.database: 155032 "100%"
18:25:55:406	digikam.database: 155033 "100%"
18:25:55:406	digikam.database: 155610 "100%"
18:25:55:406	digikam.database: 155611 "100%"
18:25:56:158	digikam.database: Duplicates with id and score:
18:25:56:158	digikam.database: 108756 "100%"
18:25:56:158	digikam.database: 108757 "100%"
18:25:56:158	digikam.database: 111330 "100%"
18:25:56:158	digikam.database: 111331 "100%"
18:25:56:158	digikam.database: 112158 "100%"
18:25:56:158	digikam.database: 112159 "100%"
Comment 10 Phrayzur 2024-09-22 01:00:12 UTC
It's continuing to give the same type of SQL errors every so often. The selected digiKam window is Similarity Duplicates, but it's showing the Similarity Image window and it doesn't change. The program window is frozen but Visual Studio Community shows it's still running but with errors.

Should I close the main window?

Is there a duplicates table I can populate through a SQL query outside the digiKam program? Using HeidiSQL perhaps? And then restart digiKam to show the duplicate results?
Comment 11 Phrayzur 2024-11-01 02:04:22 UTC
I've still got this problem. Only it seems to be worse now. If I switch to the duplicates tab, it locks everything. I've left it for over 24 hours and it doesn't recover. I'm down to about 210,000 images and videos now. I've had to redo everything a few times because it gets to a point where it doesn't move past the loading albums screen even before it starts the main program. 

Everything is incredibly slow even when it's not crashing. Moving a file to the trash takes around 20 seconds. I'm not sure what I can do about this. I need some advce. The program is unuseable for me as things are, but I've got a ton of time into it. 

I can't be the only one having these problems, can I?
Comment 12 Maik Qualmann 2024-11-01 14:12:52 UTC
With such MySQL error messages you don't need to test any further, such a message should not appear. This must be fixed first. Where is your MySQL server located? Are you using an internal or external MySQL server in digiKam?

Maik
Comment 13 caulier.gilles 2024-12-02 06:41:27 UTC
Hi,

digiKam 8.5.0. is out with many fixes and improvements.

https://www.digikam.org/news/2024-11-16-8.5.0_release_announcement/

This report still valid with this version?
Thanks in advance

Gilles Caulier
Comment 14 Phrayzur 2024-12-02 07:49:15 UTC
Created attachment 176282 [details]
attachment-102224-0.html

Yes. I tried with 8.5.0 as well. Same results

On Mon, Dec 2, 2024 at 12:41 AM <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=493294
>
> --- Comment #13 from caulier.gilles@gmail.com ---
> Hi,
>
> digiKam 8.5.0. is out with many fixes and improvements.
>
> https://www.digikam.org/news/2024-11-16-8.5.0_release_announcement/
>
> This report still valid with this version?
> Thanks in advance
>
> Gilles Caulier
>
> --
> You are receiving this mail because:
> You reported the bug.
> You are on the CC list for the bug.
Comment 15 caulier.gilles 2025-03-15 15:32:26 UTC
Hi,

digiKam 8.6.0 is just released:

https://www.digikam.org/news/2025-03-15-8.6.0_release_announcement/

Problem still exists with this version?

Thanks in advance

Gilles Caulier