Bug 384444

Summary: Wish: support "remote metadata services" (eg. AI based image tagging like Clarifai.com)
Product: [Applications] digikam Reporter: Jens <jens-bugs.kde.org>
Component: Tags-AutoAssignementAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: wishlist CC: aegoreev, caulier.gilles, geekguy22, kusi, metzpinguin, minhnghiaduong997
Priority: NOR    
Version: 5.6.0   
Target Milestone: ---   
Platform: Appimage   
OS: Linux   
Latest Commit: Version Fixed In: 8.3.0
Sentry Crash Report:

Description Jens 2017-09-06 19:19:54 UTC
See discussion on the mailing list today:

Original post: (by me)
"I want to use Clarifai to do some automatic tagging (www.clarifai.com) experiments since my tagging requirements take *way* too much time. (Actually, I think this would be a great feature for Digikam by default - automated AI based tagging ... ;) )
Basically I'm looking for a way to select some images and say "Send these images to Clarifai via API and save the returned image tags to the image metadata".
Is this possible at all without resorting to C++ and hacking the Digikam source itself?
I know I can create a batch operation with a custom shell script but this shell script expects a *different* image as the output - while I just want to update the metadata."

Reply (Gilles):
"This is a very interresting subject, but without a simple response to your question.
You cannot easily connect digiKam Database to this kind of remote web service. Only C++ code do it. ..."

Reply (Andrey Goreev):
"I second on this one.
Large corporations e.g. Google and Microsoft have similar services embedded in their solutions e.g. Ms OneDrive and Google Photos but none of them let you download your data because they want you to be hooked to their services. If digiKam was capable of getting keywords from a server via API and write them to database/metadata/sidecars using Exiv2 that would be a great feature."

My reply:
"There are several such services that allow AI operations on images via API.
Clarifai is just the (currently) most popular and best one - here's a comparison: 
https://www.quora.com/Which-company-has-the-best-image-recognition-APIs-in-the-market-place-today-What-are-they-charging
The data that these services return varies. Some do tags, some do descriptions, some descriptions are multi language, some do videos as well (frame by frame or second by second). Most services are asynchronous (Ie. you upload a bunch of images and then check later for the metadata, in a background job). We need to decide what to do with the returned metadata (for text: overwrite or append?, for tags: create in a subtree? Allow all tags or a whitelist? detect and reuse renamed/moved tags? etc).
I think a generic „upload image and then download metadata“ concept in Digikam which allows plugging in many of these services makes sense. The interface is probably always HTTP(S) based so most of the code probably already exists.
We just need a way to use it in Digikam and an options dialog for each service (for API key, maybe post and get URLs, supported file formats, returned data, etc).
Oh, and PS: Google and MS provide Vision APIs to do the same thing they do in their own photo apps. We could plug them in too. The APIs are just not free forever, there's a quota. :-)

--------

I'm going to put a bounty of €50 on this bug (as a donation to the Digikam project) if it gets implemented. It would be a huge time saver if I could use this.
Comment 1 Andrius 2017-09-06 21:00:01 UTC
I vote for this feature too
Comment 2 Jens 2018-08-24 21:17:39 UTC
Has this ever been discussed for implementation in Digikam since my original report?

I think a feature like this one would make Digikam a LOT more valuable for many people.
Comment 3 caulier.gilles 2018-11-03 11:00:38 UTC
WARNING : with digiKam 6.0.0 and later, we will not support kipi interface anymore. All tools are now located in digiKam core as well for a plan to provide a more power-full integration with Batch Queue Manager and to be able to export a workflow on a web-service.

All export tools are now available everywhere : album view, Image editor, Light table, and Showfoto. Previously, only album view from digiKam core was able to deal with export tools through libkipi.

All export tools are now located here :

https://cgit.kde.org/digikam.git/tree/core/utilities/assistants/webservices

All export tools use now a dedicated interface to communicate with the application : 

- digiKam (database) : 

https://cgit.kde.org/digikam.git/tree/core/libs/database/utils/ifaces/dbinfoiface.h

- Showfoto (files metadata) : 

https://cgit.kde.org/digikam.git/tree/core/utilities/assistants/common/dmetainfoiface.h


There is not direct use of digiKam database for compatibility with Showfoto.

We plan later to provide a dynamic loading of export tools using a plugins mechanism. This will reduce overloading of the internal core libraries. A dedicated devel branch have been created for that, but it's not yet complete:

https://cgit.kde.org/digikam.git/tree/?h=development/dplugins

But take a care, digiKam export tools as plugins will not be shared with another external application. API will still private and only shared between Showfoto and digiKam core. The experience with libkipi was bad and complex to maintain/extend in time.

Gilles Caulier
Comment 4 Jens 2020-02-13 19:42:54 UTC
I could imagine this to be hackable using the BQM.

1. Select some images 
2. invoke BQM, with a custom script that runs for each image
3. This script calls the Clarif.ai API which returns a JSON set of tags
4. This script writes these tags to the XMP sidecar of the image

Pro: those scripts can basically do anything, no need to put some proprietary API code in Digikam core. Always updatable when an API changes or a new service comes along. Also, all XMP properties are updatable, not just some explicitly allowed by Digikam. 

Missing pieces:
1. Do not force overwriting of the image, keep the original if $OUTPUT is not used in the script or is set to "-" or "" or someting.
2. How do I automatically trigger Digikam reloading the sidecar?

(How) is this possible?
Comment 5 Maik Qualmann 2020-02-14 13:30:24 UTC
We have discussed it many times. If no new file with changed image data is created in the script, $INPUT must first be copied to $OUTPUT (cp, xcopy). All changes are made to $OUTPUT. The XMP file is created after $OUTPUT.xmp. DigiKam recognizes the XMP automatically.

Maik
Comment 6 Minh Nghia Duong 2020-03-17 23:24:15 UTC
Hello Jens,

I am not familiar with the Clarifai platform. Could you help me clarify some points, please?

Firstly, As in your description, when you import an image to Clarifai, it will return an output with the image tags embedded in metadata of the image itself or just an encoded JSON of tags? 

And if it's true, is the representation of face tags in metadata is universal to other platforms?

And you want a tool to export images to Clarifai or a tool to read the output of clarifai to the images?

Nghia.
Comment 7 caulier.gilles 2020-03-18 09:13:56 UTC
Nghia,

I can clarify one point here : the XMP metadata standard used to store face tags information in image or sidecar. this can be not relevant of Clarifai.com web service of course, but at least this is how digiKam store these metadata in XMP container.

Code using Exiv2 shared library to deal with XMP face tags are here :

https://invent.kde.org/kde/digikam/-/blob/master/core/libs/metadataengine/dmetadata/dmetadata_faces.cpp

Metadata Working Group Region Schema describe the tags list used to identify the interest regions on images :

https://www.exiv2.org/tags-xmp-mwg-rs.html

This Exiv2 definitions are used in digiKam to play with face tags 5libexiv2 is a C++ component).

Exiftool also support MWG-RS, but doc is less readable :

https://exiftool.org/TagNames/MWG.html#Regions

Voilà

Gilles

Note : digiKAm do not use at all Exiftool, as it's a Perl script CLI.
Comment 8 Maik Qualmann 2021-03-29 07:24:16 UTC
*** Bug 435094 has been marked as a duplicate of this bug. ***
Comment 9 geekguy22 2021-03-29 13:26:44 UTC
clarifai i think work similar to google cloud vision api or Microsoft Computer Vision and Amazon Rekognition, which you send a url of a photo or send a base64 encoded image to the api url and it will send u a json result

for google cloud vision api you can try to see the output json here https://cloud.google.com/vision/docs/drag-and-drop

the tool would probably need to make a copy and resize the photo and base64 encode it, and probably an option of a feature to use (object detection/logo/face detection) and send the resized photo to the webservice in question, and insert the json result to the original photo as a tag/keyword

also see my post here https://bugs.kde.org/show_bug.cgi?id=435094

most of the service is free with a monthly quota, and for google cloud vision api you can get 300usd credit if you link a credit/debit card (300usd can process about 200k photo if you only use one of the feature)

(In reply to Minh Nghia Duong from comment #6)
> Hello Jens,
> 
> I am not familiar with the Clarifai platform. Could you help me clarify some
> points, please?
> 
> Firstly, As in your description, when you import an image to Clarifai, it
> will return an output with the image tags embedded in metadata of the image
> itself or just an encoded JSON of tags? 
> 
> And if it's true, is the representation of face tags in metadata is
> universal to other platforms?
> 
> And you want a tool to export images to Clarifai or a tool to read the
> output of clarifai to the images?
> 
> Nghia.
Comment 10 Jens 2021-03-29 21:08:19 UTC
Nghia,

I have missed you post completely, sorry about that. Last march was, well, chaos worldwide, I guess.

Clarif.ai (and other providers) work by accepting an image (as HTTP POST, Base64 encoded, or whatever) or an URL to an image, and they return a set of JSON tags with the detected metadata. It is up to the client app to do something with the metadata. The JSON structure is of course specific to Clarifai.

You will not get your image back and you do not need to permanently store it online or remotely to make it work.
Comment 11 Maik Qualmann 2022-08-20 11:05:45 UTC
*** Bug 458094 has been marked as a duplicate of this bug. ***
Comment 12 caulier.gilles 2023-12-01 04:24:38 UTC
Hi,

With next digiKam 8.3.0 release, the auto-tags assignment feature have been
implemented but without using a cloud service.

In fact digiKam will never use a cloud service to analyze local files, for
security/policy/right/performance reasons : local photo must never been shared
to the web in background for this kind of processing. The processing is done in
core application with delegate neural network models stored in computer.

For more details about auto-tags assignment feature, look on student work
report :

https://community.kde.org/GSoc/2023/StatusReports/QuocHungTran#Add_Automatic_Tags_Assignment_Tools_and_Improve_Face_Recognition_Engine_for_digiKam

Best regards
Gilles Caulier