Bug 514908 - Wishlist: Integrate local LLM/Vision Model support for AI-powered image captioning and tagging
Summary: Wishlist: Integrate local LLM/Vision Model support for AI-powered image capti...
Status: REPORTED
Alias: None
Product: digikam
Classification: Applications
Component: Tags-AutoAssignement (other bugs)
Version First Reported In: 9.0.0
Platform: Other Other
: NOR wishlist
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2026-01-21 17:18 UTC by 1234destiny1234
Modified: 2026-01-21 17:34 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
URL for the Github Repo (77 bytes, text/plain)
2026-01-21 17:18 UTC, 1234destiny1234
Details

Note You need to log in before you can comment on or make changes to this bug.
Description 1234destiny1234 2026-01-21 17:18:29 UTC
Created attachment 188757 [details]
URL for the Github Repo

Feature Goal:
Integrate automated, high-quality image captioning and keyword generation using local Vision-Language Models (VLM), similar to the functionality in the ImageIndexer tool by jabberjabberjabber.

Specific Features to Adopt:
Local LLM Integration: Support for backends like KoboldCPP or Ollama or similar model feature to process images locally without privacy concerns .
Automated Captioning: Use AI to generate natural language descriptions of images (e.g., "A golden retriever playing with a blue ball in a sunny park").
Advanced Tagging: Extract specific keywords from the AI-generated captions to populate the digiKam Tags hierarchy automatically.
Batch Processing: The ability to run this "indexing" over a selection of images or an entire album as a background task

Why this is needed:
Current AI tagging in digiKam is often limited to basic object detection (e.g., "dog," "car"). Modern VLMs can provide context, mood, and detailed descriptions that significantly enhance the searchability of large photo collections.