Bug 514908

Summary: Wishlist: Integrate local LLM/Vision Model support for AI-powered image captioning and tagging
Product: [Applications] digikam Reporter: 1234destiny1234 <1234destiny1234>
Component: Tags-AutoAssignementAssignee: Digikam Developers <digikam-bugs-null>
Status: REPORTED ---    
Severity: wishlist CC: 1234destiny1234, caulier.gilles
Priority: NOR    
Version First Reported In: 9.0.0   
Target Milestone: ---   
Platform: Other   
OS: Other   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: URL for the Github Repo

Description 1234destiny1234 2026-01-21 17:18:29 UTC
Created attachment 188757 [details]
URL for the Github Repo

Feature Goal:
Integrate automated, high-quality image captioning and keyword generation using local Vision-Language Models (VLM), similar to the functionality in the ImageIndexer tool by jabberjabberjabber.

Specific Features to Adopt:
Local LLM Integration: Support for backends like KoboldCPP or Ollama or similar model feature to process images locally without privacy concerns .
Automated Captioning: Use AI to generate natural language descriptions of images (e.g., "A golden retriever playing with a blue ball in a sunny park").
Advanced Tagging: Extract specific keywords from the AI-generated captions to populate the digiKam Tags hierarchy automatically.
Batch Processing: The ability to run this "indexing" over a selection of images or an entire album as a background task

Why this is needed:
Current AI tagging in digiKam is often limited to basic object detection (e.g., "dog," "car"). Modern VLMs can provide context, mood, and detailed descriptions that significantly enhance the searchability of large photo collections.