Created attachment 131132 [details] gdb backtrace after the crash SUMMARY Digikam crashes when detecting faces using multiple CPU cores STEPS TO REPRODUCE Not sure if this is reproducible by all, but this is what I did. 1. Add a large collection of photos (over 27000 JPEG files) from an external hard-drive. 2. In the People tab, change the setting "Work on all processor cores" to true. 3. For workflow, use "Detect faces" and "Skip images already scanned" 4. Then start the scanning by clicking "Scan collection for faces". OBSERVED RESULT Digikam crashes after a while. The GUI showed nothing helpful. The window died. EXPECTED RESULT Digikam should not crash. If there are problems with a photo, or something else, it should probably just be skipped, so that the scan can complete. SOFTWARE/OS VERSIONS Windows: - macOS: - Linux/KDE Plasma: Linux kernel 5.8.1-3-MANJARO (available in About System) KDE Plasma Version: 5.19.4 KDE Frameworks Version: 5.73.0 Qt Version: 5.15.0 ADDITIONAL INFORMATION The scan of my collection crashed digikam several times, so I restarted and tried again. Some scans could proceed further than others. E.g. the first scan crashed after going through less than 5% of the collection. The other scans covered some 20-30% each. The crash comes from a segmentation fault. On the latest rerun, I used GDB to get a backtrace. See the backtrace attached. The crash did not seem to be due to memory pressure. I have 16 GB of memory and during the scan, the whole system only used around 4 GB. I did not test the scanning without the "Work on all processor cores" yet.
The backtrace is missing, after the crash enter "bt" + Return key in the terminal. Maik
@Maik, thanks for pointing this out. I tried to follow the instructions here: https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports#Retrieving_a_backtrace_with_GDB and only ran "thread apply all backtrace" and posted the output of that into the attachment. I will rerun the detection another day and provide a better backtrace.
Created attachment 131155 [details] gdb backtrace with digikam built from source I decided to try and compile digikam from source to provide the best possible backtrace. So I built digikam from the GitLab sources' master branch (commit fa696a690fccf81a3b1ff70f1271e1cb0b19339e) with the bootstrap.local script. I made the following changes to the bootstrap.linux options, to avoid a build problem with a test, which I encountered: -DBUILD_TESTING=OFF -DENABLE_MYSQLSUPPORT=OFF -DENABLE_INTERNALMYSQL=OFF If you need any extra info, please let me know :) The backtrace is from a crash where I used the "Scan again and merge results" and "Work on all processor cores" options on the same collection that I scanned previously. The crash appeared at 6% completion.
Created attachment 131156 [details] gdb 'thread apply all backtrace' with digikam built from source
Comment on attachment 131156 [details] gdb 'thread apply all backtrace' with digikam built from source This is the full print from when the face detection crashed and I ran 'thread apply all backtrace' in gdb.
Created attachment 131158 [details] single-threaded gdb backtrace with digikam built from source I tested without the "Work on all processor cores" option too, and still got the crash, so it might not even be the multi-threading which is the cause? Or might it be my build which is broken?
Git commit b3baed64731a75097d2a50f53691c8c954eac26a by Maik Qualmann. Committed on 24/08/2020 at 20:38. Pushed by mqualmann into branch 'master'. try without loading notification M +4 -0 core/libs/threadimageio/fileio/loadingcache.cpp https://invent.kde.org/graphics/digikam/commit/b3baed64731a75097d2a50f53691c8c954eac26a
The backtraces are very well known to us. Only the cause is not clear and why we cannot reproduce it. Please try the commit to see if the problem can still be reproduced. Maik
Created attachment 131176 [details] gdb backtrace with patch b3baed6 Thanks for your patch suggestion! I applied the patch (b3baed64731a75097d2a50f53691c8c954eac26a), rebuilt, and reran the face detection. This time the face detection crashed at 8% instead of 6%, like yesterday, so something was maybe improved? Assuming, of course, that the face detection happens in roughly the same order each time :) I attached a new backtrace from the crash.
Created attachment 131178 [details] gdb backtrace with latest master 00883ba I went ahead and built the latest master branch (up until commit 00883ba716b80cbd0c419278a04ba5612c367e38), and that also crashes. This time at 6% again.
Simon, I currently working with the student who improve the face engine for the detection and the recognition workflow. For the detection, the student who develop under Ubuntu, as introduced plenty of mutex to protect all OpenCV calls in algorithm, as the relevant OpenCV API is not re-entrant. The student work on a dedicated branch. Branch is stable and work like a charm about detection. I parsed my huge collection without any crash and he detect more than 30.000 face to identify. So perhaps you crash problem is fixed in this development branch name "gsoc20-facesengine-recognition" It will interesting to see if this code continue to crash on your computer or not. to switch from git master to the student branch, you can process like this : git checkout -b gsoc20-facesengine-recognition remotes/origin/gsoc20-facesengine-recognition Git will checkout the branch. Reconfigure, recompile and reinstall. Test and report. To switch back to git master, just run "git checkout master". To go back to the development branch, run "git checkout gsoc20-facesengine-recognition" Thanks in advance Gilles Caulier
Created attachment 131182 [details] gdb backtrace with gsoc20-facesengine-recognition 477fd19 Thanks for the suggestion Gilles! I did what you suggested and built the gsoc20-facesengine-recognition branch. Sadly, also this branch crashes for me. However, the backtrace looks more informative this time, so hopefully it will help you and the GSOC student :) The crash occurred around 7% completion this time.
Git commit b61d547aac23ba68ee355d24960c1f148ece23d2 by Maik Qualmann. Committed on 25/08/2020 at 20:02. Pushed by mqualmann into branch 'master'. move static functions from private shared data to a separate class M +7 -7 core/libs/dimg/dimg_fileio.cpp M +27 -20 core/libs/dimg/dimg_p.h M +2 -2 core/libs/dimg/dimg_props.cpp M +0 -4 core/libs/threadimageio/fileio/loadingcache.cpp https://invent.kde.org/graphics/digikam/commit/b61d547aac23ba68ee355d24960c1f148ece23d2
Created attachment 131183 [details] gdb backtrace with gsoc20 branch + b61d547a patch I applied the patch which Maik commented, and now I got the crash with the pure virtual method call again. I could try and disable the notifications with the previously mentioned patch and try again.
Created attachment 131186 [details] gdb backtrace with gsoc20 branch + b61d547a and b3baed64 patches So I tried to apply Maik's earlier patch to get around the pure virtual method call, but now it seems like I got a deadlock situation, so that's no good. Basically, digikam did nothing for 3-4 minutes when it reached around 9%. CPU usage was down to "nothing". Anyway, this was just my experiment, so the result can be disregarded if you like :) I'll attach my backtrace with the interrupted execution.
Git commit 0308e3639955c0a07a9ec7100cdc6527841b6a12 by Maik Qualmann. Committed on 26/08/2020 at 17:14. Pushed by mqualmann into branch 'master'. check if there is a loading thread M +1 -2 core/libs/dimg/dimg_p.h M +5 -2 core/libs/threadimageio/fileio/loadsavetask.cpp https://invent.kde.org/graphics/digikam/commit/0308e3639955c0a07a9ec7100cdc6527841b6a12
Created attachment 131350 [details] gdb backtrace with gsoc20 branch + latest master 2fb8679 I tried this again with the gsoc20-facesengine-recognition branch + merged the latest master (2fb8679) on top of it. I still get the same pure virtual function call, see the attached backtrace. If I'm testing this the wrong way, please let me know!
Git commit dd40e0f1912277dfef3967b56f32d44c01c05874 by Maik Qualmann. Committed on 01/09/2020 at 20:07. Pushed by mqualmann into branch 'master'. protect loading QMap with a recursive mutex M +11 -1 core/libs/threadimageio/fileio/loadingcache.cpp https://invent.kde.org/graphics/digikam/commit/dd40e0f1912277dfef3967b56f32d44c01c05874
Created attachment 131477 [details] gdb backtrace with latest master 5102e32 I retested the master branch. This time my collection was processed until 26% and then crashed, so some improvements can definitely be seen compared to the earlier crashes :) I'll also test the latest gsoc20-facesengine-recognition branch.
In the gsoc20-facesengine-recognition branch there is no more development, everything has been added to the master branch. Maik
Git commit 7619d77c98e3b3f955274805ba6720cd88fd714d by Maik Qualmann. Committed on 07/09/2020 at 20:06. Pushed by mqualmann into branch 'master'. disable for a test ImageMagick M +4 -0 core/dplugins/dimg/imagemagick/dimgimagemagickplugin.cpp https://invent.kde.org/graphics/digikam/commit/7619d77c98e3b3f955274805ba6720cd88fd714d
Created attachment 131483 [details] gdb backtrace with latest gsoc20 branch c1861ba Thank you for the information Maik! I read your comment too late, so I tested it anyway. The backtrace was quite different from the other ones. Apparently there was a malloc() error this time. The crash came at around 58% completion of my collection.
Hi Simon, Please read also the story form this bug: https://bugs.kde.org/show_bug.cgi?id=426175 This touch code from git/master, for next 7.2.0 release. Best Gilles Caulier
Right, we call a pure virtual function in this function, the "LoadingProcess() = 0. This LoadingProcess was overloaded by a SharedLoadingTask. So why are we in the pure virtual function with this crash? Sorry, something is right not with Ubuntu with the vtables. By the way, this function is constantly called during the image loading process and only sometimes there is a crash. 0x00007ffff40e2b65 in __cxxabiv1::__cxa_pure_virtual() () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/pure.cc:50 #6 0x00007ffff648391f in Digikam::LoadingCache::notifyNewLoadingProcess(Digikam::LoadingProcess*, Digikam::LoadingDescription const&) (this=0x555555dbf6c0, process=0x5555e8c8d3a0, description=...) at /home/simon/Development/kde/digikam/core/libs/threadimageio/fileio/loadingcache.cpp:264 Maik
Git commit cde3403938b82e0cfcccb1b557b6c2319ac8557e by Maik Qualmann. Committed on 08/09/2020 at 10:37. Pushed by mqualmann into branch 'master'. add base classes initialization explicit in the constructor Related: bug 423632, bug 426175 M +2 -0 core/libs/threadimageio/fileio/loadsavetask.cpp M +2 -0 core/libs/threadimageio/fileio/loadsavetask.h https://invent.kde.org/graphics/digikam/commit/cde3403938b82e0cfcccb1b557b6c2319ac8557e
(In reply to caulier.gilles from comment #23) > Hi Simon, > > Please read also the story form this bug: > > https://bugs.kde.org/show_bug.cgi?id=426175 > > This touch code from git/master, for next 7.2.0 release. > > Best > > Gilles Caulier Hi Gilles, I read through the bug report you referred to. I have not encountered any memory noticeable leaks on my system. It's possible that there has been something minor, but I just haven't noticed since I have 16 GB of memory available. Note that my system is Manjaro KDE (and not Ubuntu). I received a bunch of updates today, so I can't say what versions I had before, but now my system opencv is at 4.4.0-1. If you wish that I test something more specific than the face detection, which I've been doing so far, do let me know! :)
Created attachment 131491 [details] gdb backtrace with latest master fb76325 I tested with the latest master, which includes the latest fixes which Maik referred to. This time the execution crashed within 30 seconds, so there seems to have been some regression.
Git commit a06b3c5dcc32a2f95c5fbf0ac2fa898931524cea by Maik Qualmann. Committed on 08/09/2020 at 20:05. Pushed by mqualmann into branch 'master'. add static cast for loading notifikation Related: bug 423632, bug 426175 M +2 -1 core/libs/threadimageio/fileio/loadingcache.cpp https://invent.kde.org/graphics/digikam/commit/a06b3c5dcc32a2f95c5fbf0ac2fa898931524cea
Created attachment 131500 [details] gdb backtrace with latest master a06b3c5 With the a06b3c5 commit from yesterday the face detection was able to proceed for a longer time again. I don't know an exact percentage, since I left digikam running on its own, but it was at least over 30%. I noticed the "warning: Source file is more recent than executable." in the backtrace is probably because I did some branch switching while gdb was running. I can guarantee that the file digikam/core/libs/threadimageio/fileio/loadingcache.cpp where the backtrace points, was not modified while gdb was running.
Ok, my guess is that the pointer we call from the QMap is no longer valid and the task was already been deleted... Maik
Git commit 901227fa96db807e02b71a84c933d34b97ce3ec3 by Maik Qualmann. Committed on 09/09/2020 at 19:56. Pushed by mqualmann into branch 'master'. changes to setStatus() function Related: bug 423632, bug 426175 M +1 -1 core/libs/threadimageio/fileio/loadingcache.cpp M +19 -12 core/libs/threadimageio/fileio/loadsavetask.cpp M +1 -2 core/libs/threadimageio/thumb/thumbnailtask.cpp https://invent.kde.org/graphics/digikam/commit/901227fa96db807e02b71a84c933d34b97ce3ec3
Fixed with bug 426175. Maik