Bug 515492

Summary: Add digiKam AI Face Recognition for Video with SRT Sidecar files option.
Product: [Applications] digikam Reporter: Chris Hernandez <chris>
Component: Faces-RecognitionAssignee: Digikam Developers <digikam-bugs-null>
Status: REPORTED ---    
Severity: wishlist CC: caulier.gilles
Priority: NOR    
Version First Reported In: 8.8.0   
Target Milestone: ---   
Platform: Other   
OS: Other   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Chris Hernandez 2026-02-04 10:22:51 UTC
I would like to propose an extension of digiKam’s "People" Face Management engine to support video files. Currently, digiKam is a leader in image metadata, but video "People" tagging remains a manual process. This feature would leverage existing AI models (Yolo/OpenVINO) to scan video files and generate time-coded face data.

Core Functional Requirements:
Video Face Scanning: 
Use a configurable interval (default: 1s) or keyframe-based analysis to detect and recognize faces within video containers. Leveraging libraries already present in Kdenlive (for frame extraction/tracking) could potentially reduce redundant development.

Probability Grouping: 
Detected faces should be grouped in the "People" sidebar based on match certainty, similar to the current image workflow, allowing for bulk confirmation or rejection.
MWG Metadata Embedding: Once confirmed, names should be written to the video's XMP metadata (Keywords/PersonInImage) using the ExifTool backend.

SRT Face-Appearance Generation: 
A unique feature to export appearance timestamps as SRT sidecar files [filename]_([face tag]).srt. This allows standard video players (VLC, etc.) to display "Face Subtitles" or allow users to search for specific appearances.


Use Case and Benefit:
This would make digiKam the first open-source DAM to offer "Face-Searchable" video. For users with large archives, this solves the problem of finding a specific person inside hours of video without having to watch the footage manually.

Technical Suggestions:
Provide a "Minimum interval between detections" setting to prevent SRT bloat.
For uncompressed or high-bitrate video where keyframes are sparse, allow a fallback to a fixed temporal interval (e.g., scan every 1 seconds).


[video_file_name]_([face tag]).srt
<begin SRT file contents>
NOTE
This SRT file shows all instances of [face tag] found in [filename]
Minimum keyframe interval - [#] second(s)
Generated by [user] with digiKam Video AI

1
$[HH:MM:SS,mmm] --> [HH:MM:SS,mmm]
$[face tag] - [x, y, w, h]

2
$[HH:MM:SS,mmm] --> [HH:MM:SS,mmm]
$[face tag] - [x, y, w, h]