Bug 475488

Summary:	In face detection, an option to suppress low quality faces as detected might improve recognition training and results
Product:	[Applications] digikam	Reporter:	gessel <gessel>
Component:	Faces-Detection	Assignee:	Digikam Developers <digikam-bugs-null>
Status:	RESOLVED FIXED
Severity:	wishlist	CC:	caulier.gilles, chrisc.gigamail, haselnuss87, metzpinguin, michael_miller
Priority:	NOR
Version First Reported In:	8.1.0
Target Milestone:	---
Platform:	Ubuntu
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:	8.6.0
Sentry Crash Report:

Description gessel 2023-10-11 23:59:15 UTC

SUMMARY
***

This is similar to the purpose or value of Bug #425263. I don't feel it is entirely duplicative, but would certainly understand if it were marked as such.

Many people report unsatisfactory results with face recognition, and I'd have to agree with that assessment, at least if the benchmark is set by various commercial face recognition algorithms and stand-alone solutions.

An issue I note that seems likely to be complicating the problem is the extraordinary accuracy of the YOLO v3 face detection algorithm. It finds faces only a few pixels across with a total dynamic range of only a few percent. I have to squint to be sure, but it is usually right: that's a face. Not sure how you saw that person lurking in the shadows, but props.

This, however, creates a workflow problem. In a modest collection of personal holiday snaps, only 114,481 in mine, I end up with 23,075 unknown and 2,217 unconfirmed, maybe 1% of which are useful. This creates a bit of a suboptimal work flow. Further, I must admit I've frequently added very low quality, barely recognizable faces to a tag set when I do recognize the person (as noted, YOLO v3 is impressive).

I believe this creates a situation where the database is working harder than most of us need it to and where we are tempted to pollute the training data with low quality images.

I believe this could be corrected by providing an adjustable filter to suppress low quality images by some criterion, including perhaps pixel count, presence of two eyes (if there's a hook for that), and by contrast ratio across the face. This would generally eliminate massive crowd detection complications, reduce the garbage input polluting the face recognition engine, and might improve the overall experience and default results.

Setting a pixels between the eyes limit is a standard feature for commercial intelligence facial recognition systems to avoid false positives to reduce the frequency of executing false positives.

In settings perhaps a UI and back end to enable:
Minimum pixels between the eyes
10 [######### 80 ######## ] 100
Minimum face contrast brightness range
10 [######### 128 # ] 255
☑ Two eyes must be visible*
☑ At least one eye and mouth visible*
☑ Cat recognition optimizations*

* I have no idea what features the face recognition engine detects and if there's any way to tune them

I feel like contrast is better than min/max brightness, though most surveillance systems usually suppress on min brightness as that's a fairly well understood proxy for image quality.

***

STEPS TO REPRODUCE
1. Run face detect
2. Get mountains of spooky accurate face detections
3. Try to figure out who all those ghostly apparitions really are

OBSERVED RESULT

Lots of noisy images of what really are faces, but yow... that's amazing.

EXPECTED RESULT

Clear faces only: I took a picture of Bob, I see Bob, didn't know Alice was back there, let alone Eve in the shadows, I'm OK with tagging the picture with their names manually, but FR should just bother with Bob.

SOFTWARE/OS VERSIONS
DK rev: 5d0a5e499b5cc5b9c7aa917575d1fb14e0105687
Linux/KDE Plasma: 22.04
(available in About System)
KDE Plasma Version: 5.24.7
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.3

ADDITIONAL INFORMATION

Ya'll awesome, thanks for a great tool.

Comment 1 Maik Qualmann 2023-10-12 06:15:16 UTC

An alternative would be to scan the collection first without YoloV3 using the "standard" engine.

Maik

Comment 2 gessel 2023-10-12 15:13:54 UTC

I'm giving that a try, cleared all unconformed and re-scanning for faces.

I see that in some face tags where the metadata is polluted with low quality face data where I've confirmed (generally correctly) a face tag with low pixel counts/low contrast that using that data to identify more faces, even with 10's or 100's of high quality face tags yields a true positive rate scanning for additional tags of ~0% accuracy (1:1000 or so suggested).  

While this is clearly a user error, it could be of assistance to users when rebuilding the training data set to offer to clear low quality faces, either interactively or automatically based on similar criterion.  Some faces are just a few dozen pixels square, noisy, blurry, or very low contrast.  It would seem fairly plausible to:

* Scan the entire collection for confirmed faces
* Compute each confirmed face rectangle's total pixel count and compare to some threshold, offer to delete/auto delete/exclude from training any face rectangles below selected minimum threshold
* pass the survivors to the image quality sorter algo and compute blur, noise, under/over exposure levels, offer to delete/auto delete/exclude from training any face rectangles below selected minimum threshold
* flush recognition training data
* rebuild recognition training data with good faces.
* rescan collection with clean, high quality recognition data.

This is independent of a sometimes necessary human guided task of ensuring face tags are not mixed up, which would also obviously confuse the algo.

It would seem a useful non-destructive automating option would be to simply tag confirmed but low quality faces as not suitable for training, but that gets back to the original ask of not considering them in the first place, however in this mode I'm suggesting a new "maintenance" option for resetting/refreshing the face recognition engine.

Comment 3 Chris 2023-10-17 09:17:15 UTC

Thumbs up. Good write-up and suggestion.

Comment 4 caulier.gilles 2024-10-07 17:12:12 UTC

Michael,

I think this settings exists now with your new implementation. Right ?

Gilles

Comment 5 Michael Miller 2024-10-07 17:20:00 UTC

Thank Gilles.  Yes.  Adding a quality measurement such as eDifFIQA(T) to determine if the face thumbnail is usable for recognition is an option I'm exploring to improve accuracy. My testing with SFace, however, is showing this may not be needed as the accuracy of SFace is orders of magnitude higher than the existing OpenFace model.  I will continue to research how best to improve face detection and face recognition accuracy.

Cheers,
Mike

References: 
https://ieeexplore.ieee.org/abstract/document/10468647
https://github.com/opencv/opencv_zoo/tree/main/models/face_image_quality_assessment_ediffiqa

Comment 6 gessel 2024-10-07 21:05:20 UTC

I'd suggest verifying that face detection and face recognition are well matched.  The issue is that face detection recognizes "faces" or face-like features fairly aggressively, especially in low-light/low resolution/low contrast modes while face recognition, understandably and completely consistent with human recognition, needs some measure of clarity to reliably assign the detected face to some specific recognized, unique tag.

Humans are the same: this isn't a problem with automated recognition but with our human expectations of it. Humans are great at detecting a face-like feature looking at them, even at oblique angles, low light, high noise situations but actually accurately recognizing the face is a whole 'nother story.

With this algo, we're not so much concerned with "is there a face-like structure in the image" than "this (tag) is in the picture." That means  a very conservative face detected but not conclusively recognized modality.

We may be well served by a "face detected but not recognized or even recognizable" tool, but the current UI assumes that a detected face can be reliably recognized and so this should be true.

-DAvid

Comment 7 Michael Miller 2025-02-02 13:17:35 UTC

Hi gessel,
We introduced a face image quality assessment check in the face training pipelines in 8.6.0.  Blurry, noisy, and pixelated images will no longer be used for face training.

Cheers,
Mike

Comment 8 gessel 2025-02-02 14:53:35 UTC

That's awesome!  Thank you!!  I'l retrain my face library and very much look forward to more consistent results.