This looks very similar to bug 262596, but that bug is from 2011, and says it's fixed, and I am getting segfaults with a compiled from sources from git of today. I was trying out digikam's faces detect on the git master from a couple days back, and noticed that the application is crash-happy when detecting/tagging faces. So, I pulled from git today, and retried the operation. So, to test, I just opened digikam, pointed it at a directory of photos and let the face detect do it's thing. Reproducible: Always Steps to Reproduce: 1. Open digikam 2. Select "faces" icon 3. Select scan database for face. 4. select subset of photo database 5. Set it to detect faces only, not try to recognize them. 6. Let it run... Actual Results: The program segfaults after running for a few minutes. Expected Results: All the faces to be detected without a segfault. I am using three MariaDB databases for DigiKam, rather than the internal sqlite.
Created attachment 99925 [details] Backtrace of the fault
Created attachment 99926 [details] A different segfault on updating the faces database I have a feeling that the faces detect, and updating the faces database may be related. With this log I was just sorting through the unknown faces, with no face detect running.
Which OpenCV version you use ? Look in Components Info dialog for details. Did you use multicore CPU option to detect face ? Gilles Caulier
The backtrace do not include debug symbols. Please recompile whole digiKam source code with debug, using cmake option "-DCMAKE_BUILD_TYPE=debug" Gilles Caulier
Rebuiding takes a while. In the meantime, I found the option for using multiple CPU, and on my machine it is turned off. I'll go recompile with debug options.
rebuilding the debug version is unfortunately taking longer than expected. It ate up all my space on a 4gb /tmp... and it took two tries to figure that out. It's building now, on a much bigger drive. However, I can reliably crash digikam. It "feels" like it's a race condition. Paradoxially, when my system is really busy and slow to respond it's much more difficult to crash digikam. I'll have a proper report in the morning. ;)
Created attachment 99938 [details] Segfault with full debugging symbols Finally managed to get this full debud build. I'll add a few more crash logs...
Created attachment 99939 [details] full debug, only faces detect
Created attachment 99940 [details] segfault just updating the faces database
So you use Mysql as database. Right ? and which Opencv library version you use ? 2.x or 3.x ? Gilles Caulier
The database is MariaDB, which is the same as MySQL, I believe. I use OpenCV 3.x
If the crash is a race condition between the FaceManagement code and the database interface, perhaps valgrind can help to identify when memory is corrupted. Typically it's in face histogram computation thread. Note : Face detection do not crash using Sqlite database here. I process 20K images in 10 minutes on my main linux computer. Gilles Caulier
Created attachment 99946 [details] Valgrind log of a segfault Finally, a Valgrind log of the error
Created attachment 99947 [details] Konsole output around the error. Noticed this error about halfway down the attached file. For some reason MySQL was "not available"
Sound like the source of the problem : ==16753== Conditional jump or move depends on uninitialised value(s) ==16753== at 0x973DFAA: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x9744026: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0xC22C56C: cv::parallel_for_(cv::Range const&, cv::ParallelLoopBody const&, double) (in /usr/lib/libopencv_core.so.3.1.0) ==16753== by 0x9745AE0: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x9750104: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x973AE2D: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x974EF74: cv::CascadeClassifier::detectMultiScale(cv::_InputArray const&, std::vector<cv::Rect_<int>, std::allocator<cv::Rect_<int> > >&, double, int, int, cv::Size_<int>, cv::Size_<int>) (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x5000F2D: FacesEngine::OpenCVFaceDetector::cascadeResult(cv::Mat const&, FacesEngine::Cascade&, FacesEngine::DetectObjectParameters const&) const (opencvfacedetector.cpp:469) ==16753== by 0x5001F5A: FacesEngine::OpenCVFaceDetector::verifyFace(cv::Mat const&, QRect const&) const (opencvfacedetector.cpp:536) ==16753== by 0x500360C: FacesEngine::OpenCVFaceDetector::detectFaces(cv::Mat const&, cv::Size_<int> const&) (opencvfacedetector.cpp:767) ==16753== by 0x5019A31: FacesEngine::FaceDetector::detectFaces(QImage const&, QSize const&) (facedetector.cpp:160) ==16753== by 0x5324A5A: Digikam::DetectionWorker::process(QExplicitlySharedDataPointer<Digikam::FacePipelineExtendedPackage>) (facepipeline.cpp:483) ==16753== Uninitialised value was created by a heap allocation ==16753== at 0x4C2ABD0: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==16753== by 0xC079B21: cv::fastMalloc(unsigned long) (in /usr/lib/libopencv_core.so.3.1.0) ==16753== by 0xC1CD3D0: cv::Mat::create(int, int const*, int) (in /usr/lib/libopencv_core.so.3.1.0) ==16753== by 0x974BB2E: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x9745609: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x9750104: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x973AE2D: ??? (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x974EF74: cv::CascadeClassifier::detectMultiScale(cv::_InputArray const&, std::vector<cv::Rect_<int>, std::allocator<cv::Rect_<int> > >&, double, int, int, cv::Size_<int>, cv::Size_<int>) (in /usr/lib/libopencv_objdetect.so.3.1.0) ==16753== by 0x5000F2D: FacesEngine::OpenCVFaceDetector::cascadeResult(cv::Mat const&, FacesEngine::Cascade&, FacesEngine::DetectObjectParameters const&) const (opencvfacedetector.cpp:469) ==16753== by 0x5001F5A: FacesEngine::OpenCVFaceDetector::verifyFace(cv::Mat const&, QRect const&) const (opencvfacedetector.cpp:536) ==16753== by 0x500360C: FacesEngine::OpenCVFaceDetector::detectFaces(cv::Mat const&, cv::Size_<int> const&) (opencvfacedetector.cpp:767) ==16753== by 0x5019A31: FacesEngine::FaceDetector::detectFaces(QImage const&, QSize const&) (facedetector.cpp:160)
I'm sorry, I can't read that log as clearly as you do. I know how to produce them, and then hopefully someone smarter than me can pick at it and hopefully update the git with a fix. Can you put it in layman's terms, please? What happens next?
Created attachment 99970 [details] add mutex to prevent non re-entrency in OpenCV API Try with my little patch if it fix the problem. Gilles Caulier
Hey there... Unfortunately, that patch did not stop the crashing. With just scanning for faces, it got about 6% through my collection of 30,000 photos ( 160Gb ) before crashing again. This time it's not a segfault, but the program halts with this weird error: digikam.dbengine: Prepare failed! digikam.dbengine: Failure executing query: "SELECT orientation FROM ImagaInformation WHERE imageid=?;" Error messages: "QMYSQL3: Unable to prepare statement" "MySQL server has gone away" 2006 2 Bound values: () One thing worth noticing though, is that I also tried to scan and recognize faces, and it zipped through that entire collection in seconds, and did not find a single face. Just recognizing faces on their own seems to work fine, however. I'll start building digikam with debugging enabled, and see if I can track this one down.,
Created attachment 99977 [details] Segfault, with the patch applied on detect faces only Once I rebuilt digikam with debugging enabled, I ran it through gdb with detecting faces only. It failed fairly quickly with a segfault, and I have attached the log here. Right now I am running the same through Valgrind, but it might be a while before it crashes. -Evert-
Created attachment 99978 [details] valgrind with mysql gone away error There seems to be two distict modes of failure here. One is the "mysql gone away" error, and the other is a segfault. I just attached a valgrind for the mysql gone away error, I'll re-run and see if I am "lucky" enough to catch the program segfaulting in valgrind.
> digikam.dbengine: Failure executing query: > "SELECT orientation FROM ImagaInformation WHERE imageid=?;" Have you copied this string or it may contain a typo? ImagaInformation => ImageInformation The query string is correct in digKam source code. Maik
Re: Maik Qualmann, I did indeed make a typo. Thanks for pointing that out. -Evert-
Ah, Maik, the same error is repeated in the valgrind log that I just attached, and since I did not type that one, it definitely does not contain typos. Thanks for looking into this! -Evert-
I see in the valgrind log that QuickTime video is included. I think it's possible that Exiv2 crashes here. Can you these video files for test removed in face recognition? Maik
Created attachment 99981 [details] valgrind log OK, Thanks for looking into this. I have moved all the .mov and .mp4 from that directory. There are still some cr2, png, and tiff mixed in, but exiv should not have a problem with those. Right now digikam is running through my collection, and let's see if it crashes.
Created attachment 99984 [details] Console output of crash after removing movie files. So, I have removed all the movie files from the directory, and let the face scanner do it's thing. Now I'll try again through gdb, and also remove anything that is not a .jpg, just to be sure.
Can you check if these images are broken? /data/DigiKam/Photos/Home/2015/02-28 Oval Track/Evert Pictures/IMG_7377.JPG /data/DigiKam/Photos/Home/2015/02-28 Oval Track/Evert Pictures/IMG_7378.JPG /data/DigiKam/Photos/Home/2015/02-28 Oval Track/Evert Pictures/IMG_7381.JPG /data/DigiKam/Photos/Home/2015/02-28 Oval Track/Evert Pictures/IMG_7383.JPG Maik
Maik, even if images are broken, or if video/raw files crash Exiv2, all Exiv2 API are wrapped around exception catch which must be handle by high level implementation in digiKam. Typically if a file break Exiv2, we must able to take the right direction to prevent an broken workflow in face detection threads. It's know that Exiv2 < 0.25 are very sensible of video files for ex, but it's better now with last 0.25 stable release. Evert; Just to be sure, which Exiv2 version do you use exactly ? Gilles Caulier
Maik, thanks again for looking into this. I am currently doing a valgrind log hoping to catch another segfault. It's just so terribly slow! On every try I get about 3 - 6% through the database before digikam quits. Like I mentioned before there seems to be two distict issues, one where the database goes away, and one where digikam segfaults. I have applied the small patch from Gilles, but it does not seem to have made a difference to either error. I have to re-iterate that this is scanning through the database doing faces detect only. If I do faces detect and recognize, the process completes very quickly, and no faces are detected. The four pictures listed above open fine with Gwenview.
Gilles: extra/exiv2 0.25-3 and libkexiv2 from git, r782.6c196e4-1
My patch must be the right direction but certainly not at the right level in source code. Look well as we touch data not initialized while OpenCV API call It sound like a non re-entrancy somewhere. Remember that face fingerprints while detection is computed in a separated thread. It's more complex when multi-cores are used. Perhaps the patch must be applied in some top level call in facedetector class. Gilles Caulier
Exiv2 is fine. We don't use libkexiv2 anymore. Implementation is not in digiKam core, to reduce the puzzle. Gilles Caulier
There is some test to do , if possible. 1/ using OpenCV2 instead OpenCV3. There is a flags to turn off in digiKam cmake configuration script before to compile. 2/ using sqlite database instead Mysql, to see if crash is reproducible. 3/ in all case single core and multicore must be tested to validate. Gilles Caulier
Created attachment 99986 [details] facedetector.patch Please test also this patch. Maik
I am currently running digikam through valgrind. Once that crashes, I will install Maik's patch only, and see if I can reproduce the issue. I will notify the maintainer of digikam-git that libexiv2 is no longer a requirement for digikam. I can turn off the external database for testing purposes, but having the database external is very much a desired feature for me. Unfortunately, downgrading to opencv 2 is not an option on this machine, there is too much of my other software that depend on it.
>I will notify the maintainer of digikam-git that libexiv2 is no longer a requirement for >digikam. Not libexiv2, but libkexiv2. For dependencies details, look here : https://quickgit.kde.org/?p=digikam-software-compilation.git&a=blob&f=DEPENDENCIES Gilles Caulier
Maik, Your facedetector.patch must be applied to git/master in all cases... Gilles Caulier
Created attachment 99987 [details] gdb crash log with facedetect patch loaded. Unfortunately, it seems that the facedetect patch did not stop the segfaults. I ran it through gdb, as that is quite a bit faster than valgrind. This crash was reproduced by just scanning for faces, no recognition, and not surfing around in the detected faces tags. I found that just changing the tag associated to a picture will segfault digikam, in fact, that happens quite a lot. I will now go disable the external mysql, and see if I can make it crash.
>I found that just changing the tag associated to a picture will segfault digikam Do you mean a simple tag to image or a face tag ? Both are different in the way to process data in background. Gilles Caulier
I was just updating some of the face tags that the recognizer got wrong, without scanning for new faces, and this rather quickly segfaults digikam. I am now starting up digikam with the sqlite db, and of course it has to scan through all my photos first, which takes a while.
Created attachment 99988 [details] Segfault with facedetect patch applied, running on SQlite db Looks like it's not the database backend, then, as it segfaults in exactly the same way when running internal sqlite instead of MySQL. I suppose the only thing left to do now would be to get it to segfault while running valgrind, right?
Here with sqlite, the crash is not reproducible. I compiled OpenCV3 myself and uninstalled openCV2 before, to prevent binaries mix of the library. Using valgrind reduce execution speed as you have seen. The program is executed in a "VM" like which check all memory allocation and use. If the crash is due to a race condition as i suspect, you will not able to reproduce it. This is why i would to add a mutex is the critical section of the OpenCV call code to be sure that method is not called more than one time by separated threads. But as you say that crash is also reproducible just when tagging face as well. perhaps the problem is in OpenCV as well... I don't know.... This is why a check with OpenCV2 can be interesting to do. Gilles Caulier Gilles Caulier
Do you have opencv3-contrib installed. This one include face detection algorithm. Typically for 5.0.0 i included this opencv-face directly in digiKam core, as opencv 2 to 3 still a transition switch. If code backported few mont ago has a bug, i tried to update it as well without success. So the idea is to require opencv contrib at configuration time though digiKam cmake script and to remove opencv-face module from digiKam core and use system based module instead. Gilles Caulier
Created attachment 99990 [details] The list of files from OpenCV I am on Arch linux, we don't have contrib. The closest thing is self compile from git. :) I have attached the file list of the OpenCV I have installed on my system. It's version 3.1.0, and looks like it includes face detection. What is weird about this bug is that it does do quite a few face detections before it fails, but not always at the same point.
Created attachment 99991 [details] patch to use face module from opencv_contrib instead from digiKam core Evert, New patch to drop face module in digiKam core and use last one from opencv_contrib. Opencv3 need to be compiled with OpenCV_Contrib module of course... Maik, I have a big doubt on the right way to implement this new pure virtual method : void predict(InputArray src, Ptr<PredictCollector> collector) const = 0; ... defined in face.hpp and to code in facerec_borrowed.cpp Gilles Caulier
With my patch, detection process my whole collection with 2271 images using sqlite database. Yes, it's not too huge but it's a test collection on my virtual machine. Until now, all work fine. No crash, faces are detected (15%) Gilles Caulier
Gilles, I have a patch created for the new face recognition already a few months ago. But I see a slight change in the ABI to the current code from github. OpenCV minimum would be version 3.1. I will upload the patch for testing here. Maik
Created attachment 99992 [details] face31.patch New face modul from OpenCV-3.1. But not the latest version. Maik
Evert, The face scan with my patch is finished with 244 faces found and no crash.... Now i will test with a Mysql internal database. Gilles Caulier
Evert, With my patch, and Mysql remote server, face scan is started over the same collection and no crash appear until now.... Wait and see... Gilles Caulier
Screenshot of face scan with remote mysql in action : https://www.flickr.com/photos/digikam/28121515122/in/dateposted-public/ Gilles Caulier
My packages for OpenCV are not using a TBB (libtbb2). Your Yes. It could be related herewith: http://code.opencv.org/issues/4489 Maik
Ever, Faces scan with Mysql dataabse server is complete without a crash. I used single core to compute face detection. I will not try to use whole cpu core. Gilles Caulier
Gilles, I found the contrib version of OpenCV, but the network link I am behind is terribly slow, and I have been unable to download and compile it. I tried your patch anyways, but digikam absolutely requires the contrib package of opencv. Right now I am trying Maik's patch, as that allows me to compile against the version of OpenCV that is included in standard Arch. In a couple of weeks I will be home, with faster and more reliable internet, and then I will try your patch again.
The cause is compiled libtbb2 support in OpenCV. Gilles, look in the crash logs from Evert, our distributions have libtbb2 disabled in OpenCV. See my Comment 52 Maik
Now we are getting somewhere. I finally managed to compile a version of opencv that has tbb disabled. With Maik's patch installed I was able to scan through 25% of my library before I hit the bug of "MySQL server has gone away", so definitely have one issue nailed down to opencv's use of tbb. I can try again with internal SQlite, and see if I can scan through my collection completely. Since this version of opencv I have now is the contrib version, I will try again to build with Gilles' patch. Looking back over the patches, which, if any, are recommended for me? Do I file a bug report at OpenCV for the crashes in TBB? (I did a bit of reading about it, and appears that the memory errors that valgrind reports is because TBB uses their own scheduler and that can confuse valgrind)
Created attachment 100002 [details] Compile error Gilles patch to use external face module
OK, so now DigiKam crashes with both internal and external SQL, at approximately 30% through my collection. I'll start a new bug for that. I get a compile error with Gilles' patch, and Maik's seems to head off the error with TBB if it is disabled in opencv. I'll compile digikam without any patch, and run it against my collection with both databases to confirm that Maik's patch is needed in conjunction with a TBB free opencv.
Note : with my patch, multi-coree enabled, and mysql server, no crash. All work fine. I didn't see the tbb opencv dependency. I will look if mine as this module enabled. Gilles Caulier
Hi there.. OK, so, I recompiled digikam with no patch, and ran the face detection against my picture library. The face detect would usually fail anywhere from 1 to 4% through the library. Ever since I have recompiled opencv to not use TBB, digikam scans much further into the library, regardsless of whether there is a patch applied or not. So I think we are at the root of this issue. I will open a bug report with opencv, as the bug is in their handling of TBB. I will open another bug for the failures I am seeing now, just to keep the troubleshooting that was done seperate and not confuse the issue. Thank you very much for your help!
Small correction, it seems my source tree was not cleaned as thoroughly as I thought it was, and so I appear to have built digikam with all the patches applied when I thought it was not. My apologies. I am now rebuilding vanilla digikam, to see if it still fails with TBB disabled in OpenCV.
Just tested with the vanilla code, and digikam does indeed crash earlier than with the patch. I am now installing just Maik's patch, and re-running it.
Update on the status. I am now compiling digikam with all the patches except for Gilles' patch to use face module form opencv_contrib instead from digiKam core. That patch causes a compile error. I think it's due to the abi version of opencv that I have is slightly different. With just Maik's face31.patch installed I was able to scan through my collection with about 4 tries. I ended up with about 4000 faces identified out of 32000 photos. I am now going to try again with all the patches installed and see if they address different issues.
Evert, My patch use OpenCV 3.1 tarball. OpenCV-contrib come from github as well (current implementation. As i know there is no OpenCV-contrib tarball released. This is very problematic as API/ABI version can be the hell to follow. This is why i included face module code in digiKam core as well instead to use OpenCV-contrib instead. Sound like my firts choice is not too bad after all. As i already said somewhere and sometime, openCV is a another big puzzle... I can update code in digiKam core about face module, as i know now what i need to change to compile fine with OpenCV 3.1. Gilles Caulier
As i said previously, i processed on my VM with my collection all faces scan with Mysql server + multicore CPU support. No crash. About TBB support in OpenCV 3.1 that i compiled, i don't set anything to configure OpenCV, excepted the fact to pass path to OpenCV-Contrib modules source code to include Face module as well. That all. As i can see TBB is disabled by default, so i suspect that no TBB here. Gilles Caulier
Created attachment 100038 [details] currentFace31.patch New patch, face modul updated to the latest version of OpenCV3.1. Please test it with this patch. Maik
Thanks for the updated patch. It's building now. My latest test was with all the patches installed, and it scanned all the way through my collection. There are a few smaller bugs where the program does not work as expected that could very well be due to the ABI change, so I'll test all of them again.
Maik, your patch works beautifully. With yours and Gilles patches installed I am unable to make digikam segfault. This is running with external MySQL db, and updating the unknown pictures as they come up.. Pretty solid. Unfortunately the actual face recognition is pretty poor, with only a 10% successful recognition. ie: recognizes a face that I have provided hundreds of examples for. Will this become part of the standard digikam?
Ever, Please do not close this file, we must apply patches to git/master before to mark it as resolved. Q: Your opencv still compiled with Intel TBB support ? The recognition algorithm need to be improved, that true. We have currently a student working on a eyes auto detection and correction which will come as an extension to face engine in digiKam. If project is completed, we will assign recognition improvement next summer (not before as it will be a part of GoSC) Maik, What's will be the synthesis about this file to apply patches on git/master exactly ? Gilles Caulier
The minimum version of OpenCV3 we must set to V3.1. OpenCV2 should compile, but tested only a few months ago. Maik
Maik, well apply the patches. MXE still with openCV 2.4.X Macport is already openCV 3.1.X So i can check openCV 2 when patches will be applied. Gilles
Git commit 8cdfcc52f402b44378fe8e6a9b7961585e17340e by Maik Qualmann. Committed on 13/07/2016 at 17:17. Pushed by mqualmann into branch 'master'. apply patch #99970 to add mutex to prevent non re-entrency in OpenCV API M +6 -0 libs/facesengine/detection/opencvfacedetector.cpp http://commits.kde.org/digikam/8cdfcc52f402b44378fe8e6a9b7961585e17340e
Git commit 3e31dad1c6c10bb6da6a269a007eee5cc209e412 by Maik Qualmann. Committed on 13/07/2016 at 17:22. Pushed by mqualmann into branch 'master'. apply patch #99986 to check for a valid QImage M +5 -0 libs/facesengine/facedetector.cpp http://commits.kde.org/digikam/3e31dad1c6c10bb6da6a269a007eee5cc209e412
Git commit 88123604ccac3cdda4557273f1b280d6772adc31 by Maik Qualmann. Committed on 13/07/2016 at 17:27. Pushed by mqualmann into branch 'master'. apply patch #100038 to update openCV3 face modul to the current version M +1 -0 libs/facesengine/CMakeLists.txt M +8 -24 libs/facesengine/opencv3-face/eigen_faces.cpp M +17 -5 libs/facesengine/opencv3-face/face.hpp M +0 -9 libs/facesengine/opencv3-face/face_basic.hpp M +13 -0 libs/facesengine/opencv3-face/facerec.cpp M +2 -5 libs/facesengine/opencv3-face/facerec.hpp M +8 -25 libs/facesengine/opencv3-face/fisher_faces.cpp M +8 -25 libs/facesengine/opencv3-face/lbph_faces.cpp M +5 -2 libs/facesengine/opencv3-face/precomp.hpp A +114 -0 libs/facesengine/opencv3-face/predict_collector.cpp [License: Unknown license] * A +127 -0 libs/facesengine/opencv3-face/predict_collector.hpp [License: Unknown license] * M +38 -0 libs/facesengine/recognition-opencv-lbph/facerec_borrowed.cpp M +9 -0 libs/facesengine/recognition-opencv-lbph/facerec_borrowed.h The files marked with a * at the end have a non valid license. Please read: http://techbase.kde.org/Policies/Licensing_Policy and use the headers which are listed at that page. http://commits.kde.org/digikam/88123604ccac3cdda4557273f1b280d6772adc31
Git commit 78f21055e816ae1dc219185b7497f3c1299629e3 by Maik Qualmann. Committed on 13/07/2016 at 17:37. Pushed by mqualmann into branch 'master'. set minimum openCV3 version to 3.1.0 M +3 -3 CMakeLists.txt http://commits.kde.org/digikam/78f21055e816ae1dc219185b7497f3c1299629e3
No problem to compile with OpenCV2 I close this file now. Gilles Caulier