435184 – How can I integrate an object detection plugin written in Python?

Bug 435184 - How can I integrate an object detection plugin written in Python?

Summary: How can I integrate an object detection plugin written in Python?

Status:	RESOLVED FIXED

Alias:	None

Product:	digikam
Classification:	Applications
Component:	Plugin-Bqm-AutoTags (other bugs)
Version First Reported In:	7.3.0
Platform:	Other Other

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Digikam Developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-03-31 12:02 UTC by Olive Ox
Modified:	2023-12-01 12:07 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:	8.3.0
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Olive Ox 2021-03-31 12:02:13 UTC

I'd like to contribute to DigiKam with an object detection feature, with the following particularities:
- detect objects in images
- detect objects in videos (maybe later associate them with a timeframe / frame so that the user can skip through the video at points where a specific object is visible. But that needs some extra UI work too)

Unfortunately, I don't know C++ but I am somewhat competent with Python. Looking at the provided documentation (https://www.digikam.org/api/) I can't understand how / if it is possible to integrate a Python plugin into DigiKam. 

### My ideas regarding the implementation ###

## No DigiKam integration - a separate Python process
- started separated from the DigiKam
- read from digikam.db all the imported photo / video files (Images table)
- iterate through them and apply the object detection service
- aggregate all detected objects and, for each object type, insert a tag in the database Tags table. All generated tags would be under a root tag called 'objects 'for example
- based on the (image -> detected object) created at step 2, associate each Image table entry id with the Tags entry id, in the ImageTags table
- start / restart DigiKam to refresh the DB changes

This way, the same tag based system can be used for searching media files based on the objects they contain.

## DigiKam integrated

Similar to the 'People' section related to the Facial Recognition feature. I'm imagining an 'Object' section with 1 buttons for:
- detect objects (based on a pre-tained model) (synonym to the 'Detect Faces')
   - be able to click yes / no if objects were detected correctly or not which would enrich the accuracy of the model(same as in Facial Recognition)

This would imply an integrated Python service that could be started and work inside DigiKam.


Feature platforms: Ubuntu 20.04 / Windows 10


Questions:
- is it possible to integrate something likes this (not talking about the UI now), fully written in Python, in the DigiKam ecosystem
- if yes, could anyone give me some guidelines of how can this be specifically done?

A big thank you to all the contributors of this awesome project. I would gladly step in too.

Comment 1 caulier.gilles 2021-03-31 12:28:36 UTC

Hi,

The plugins interface is a pure C++. There is no gateway yet for Python. In all case, following your plan, you will need to use C++ a little bit.

I write some demo plugins on my github account :

https://github.com/cgilles/digikam-plugins-demo

There is no plan for the moment to write a Python wrapper for plugins, even if it's possible to do it, based on Krita work for ex...

Also there is no database access directly. An interface allow to play with item properties. Again, this interface is written in C++.

https://invent.kde.org/graphics/digikam/-/blob/master/core/libs/dplugins/iface/dinfointerface.h

If you look inside this interface the people properties from database are not yet exported to the plugins.

So a lot of work for this kind of wishes. My Q is why to re-invents the wheel, where all is mostly written in C++ in digiKam core. For only to use Python in a C++ project based on more than 1,5 M of code lines ? I must admit that i don't understand this plan.

About performance, C++ is so far better than Python. So again, i don't understand the plan.

Gilles Caulier

Comment 2 Olive Ox 2021-03-31 14:14:44 UTC

(In reply to caulier.gilles from comment #1)

Thank you for your technical references.

> So a lot of work for this kind of wishes. My Q is why to re-invents the
> wheel, where all is mostly written in C++ in digiKam core. For only to use
> Python in a C++ project based on more than 1,5 M of code lines ? I must
> admit that i don't understand this plan.

I don't have any C++ experience. I had no suggestion of rewriting any of the existing code in Python. The only objective of my post was to explore the possibilities of integrating a Python module in the DigiKam ecosystem since I didn't find anything in the documentation.

> About performance, C++ is so far better than Python. So again, i don't
> understand the plan.

I agree. I am not here to argue on C++ vs Python. Again, my only words were that I don't have any experience with C++ but have some with Python and was exploring the possibility of using it to contribute with an idea.

Thanks again.

Comment 3 caulier.gilles 2023-12-01 11:50:26 UTC

Hi,

With next digiKam 8.3.0 release, the auto-tags assignment feature have been
implemented without using a cloud service. The processing is done in
core application with delegate neural network models stored in computer.

For more details about auto-tags assignment feature, look on student work
report :

https://community.kde.org/GSoc/2023/StatusReports/QuocHungTran#Add_Automatic_Tags_Assignment_Tools_and_Improve_Face_Recognition_Engine_for_digiKam

Best regards
Gilles Caulier