Created attachment 188757 [details] URL for the Github Repo Feature Goal: Integrate automated, high-quality image captioning and keyword generation using local Vision-Language Models (VLM), similar to the functionality in the ImageIndexer tool by jabberjabberjabber. Specific Features to Adopt: Local LLM Integration: Support for backends like KoboldCPP or Ollama or similar model feature to process images locally without privacy concerns . Automated Captioning: Use AI to generate natural language descriptions of images (e.g., "A golden retriever playing with a blue ball in a sunny park"). Advanced Tagging: Extract specific keywords from the AI-generated captions to populate the digiKam Tags hierarchy automatically. Batch Processing: The ability to run this "indexing" over a selection of images or an entire album as a background task Why this is needed: Current AI tagging in digiKam is often limited to basic object detection (e.g., "dog," "car"). Modern VLMs can provide context, mood, and detailed descriptions that significantly enhance the searchability of large photo collections.