515936 – Support for Semantic Image Search using CLIP-ViT-H-14 models

Bug 515936 - Support for Semantic Image Search using CLIP-ViT-H-14 models

Summary: Support for Semantic Image Search using CLIP-ViT-H-14 models

Status:	REPORTED

Alias:	None

Product:	digikam
Classification:	Applications
Component:	Searches-Advanced (other bugs)
Version First Reported In:	9.0.0
Platform:	Other Other

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Digikam Developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2026-02-13 08:34 UTC by 1234destiny1234
Modified:	2026-02-13 09:20 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description 1234destiny1234 2026-02-13 08:34:32 UTC

SUMMARY
I would like to request the integration of the CLIP-ViT-H-14 multimodal model into digiKam to enable advanced semantic search and automated image tagging.
RATIONALE
Currently, digiKam relies on metadata (EXIF/IPTC) and basic AI tools for face detection and quality analysis. Adding a CLIP (Contrastive Language-Image Pre-training) backbone would allow users to:
Search by Natural Language: Search for images using descriptive phrases (e.g., "sunset over mountains with a red car") without needing manual tags.
Improved Visual Similarity: Find "more images like this" with much higher accuracy than current color-based histograms.
Automated Keyword Suggestion: Use the ViT-H-14 model to generate high-quality semantic keywords for a collection.
TECHNICAL SUGGESTIONS
Model: CLIP-ViT-H-14-laion2B-s32B-b79K is widely considered the industry standard for open-source semantic embeddings.
Implementation: This could be integrated into the existing "Maintenance" or "Search" sidebar. Since digiKam already uses OpenCV and deep learning engines for face recognition, this model could leverage the same GPU acceleration infrastructure.
Performance: While ViT-H-14 is large, it provides a significantly better "zero-shot" understanding than the smaller ViT-B models, making it ideal for professional photography management.
ADDITIONAL CONTEXT
Other open-source photo managers (like Immich or Photoprism or Photochat AI ) have successfully implemented CLIP-based search. Bringing this to digiKam would maintain its position as the premier advanced photo management suite for the KDE community.