Bug 426003

Summary: Implementing object detection
Product: [Applications] digikam Reporter: markd <citbparpmakajjecpg>
Component: Tags-AutoAssignementAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: wishlist CC: caulier.gilles, dinhthanhtrung1996, jan.waldhorn, metzpinguin, minhnghiaduong997, quochungtran1999
Priority: NOR    
Version: 7.0.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 8.3.0
Sentry Crash Report:
Attachments: attachment-25818-0.html

Description markd 2020-08-30 19:38:06 UTC
Hi, 

Now in August 2020 the face detection has really improved a lot. The initiative kde took is really great!

I would like to know if there is a plan to also add common object detection (not faces). For instance, water, tree, bike, car, mug, plate, food etc. etc.

Here is an example of open CV proposing the Yolo Object Detection:
https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html

It is available in C++.

So the idea would that a trained model scan each of the photo (in our library) and then propose multiple objects.
Comment 1 Maik Qualmann 2020-08-30 19:41:30 UTC

*** This bug has been marked as a duplicate of bug 416988 ***
Comment 2 caulier.gilles 2020-08-30 20:11:52 UTC
Nghia,

By curiosity did you already take a look to the OpenCV link given to the description of this file ?

Gilles
Comment 3 Minh Nghia Duong 2020-08-30 20:25:45 UTC
(In reply to caulier.gilles from comment #2)

Yes, I did. Actually, I tried it and it works wonderfully with the existing SSD and YOLO faces detection of faces engine. All we need to do is to download the corresponding files and add a little code to differentiate the pre-defined classes of the model.

If you want I can implement it after the merge of GSoC.
Comment 4 caulier.gilles 2020-08-30 20:39:33 UTC
Maik, Thanh, 

your viewpoint about Nghia proposal from comment #3 ?

Best

Gilles
Comment 5 Minh Nghia Duong 2020-08-30 21:48:15 UTC
Implementing DNN detection is simple, but we also need to define the use-cases and workflow for object detection. 

We can just choose an image and then return the image with bounding boxes and the name of the object like the example in the link above, but it's not really pratical for digikam, is it?
Comment 6 markd 2020-08-30 22:43:49 UTC
Hi Nghia,

Very happy to hear that you work on that topic. Sounds so great, thanks for your hard work.

As I am a Digikam User and a iphone user it would be great to have the following use case.

1. Each picture run to the yolo model and get assign 0, 1 or more than 1 object

2. If an object is wrong then the user can delete or update it (by update i mean chosing 1 of the many existing object of the yolo model). But it would be very annoying that i have to verify if each predicted object is correct or not!!! 


With my iphone the use case is the following.

I take a picture of something lets say a sushis.

Then several days later i want to see all picture of sushi i took
So I go to the reseach bar and type 'sushi' then i see all the picture of sushi.  
 
Would be great to have this feature.


In addition would be also great to have tags of each assigned object [like for people + manual tags], so there would be a category 'objects' with its subcat 'tree' 'sushi' etc  then i could simply click on sushi to see all picture of sushis
Comment 7 Thanh Trung Dinh 2020-08-31 14:57:12 UTC
Created attachment 131311 [details]
attachment-25818-0.html

Hi,

As @markd said, it may be useful for users who want to search for images
relating to 'sushi' or some specific objects, but in my opinion, the scope
of this project needs to be reviewed carefully. Since YOLO is designed for
object detection in general, there will be plenty of results for some
trivial objects such as: table, spoon, banana, etc. Moreover, I've seen
many cases where objects detected by YOLO are in the corners or not clearly
visible. So, maybe an image tagged with sushi but it's far away from the
view.

Moreover, for specific objects (like sushi, plants, monuments, etc.) I
suppose we need a YOLO version trained on specific datasets for those
objects (or users may train the network themselves). Therefore, we really
need to define clearly the objects that we aim to include in digikam for
object detection.

So the project is really interesting, but I would propose to create a poll
from digikam users to get an idea on what object detection we want to
support. Otherwise, a more extensible way but requiring some work from
users is to design code templates for object detection (extending from
facesengine). Then, users only need to train and provide the weights for
the network to run the detection on their own.

Best,
Trung

On Mon, Aug 31, 2020 at 12:43 AM markd <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=426003
>
> --- Comment #6 from markd <citbparpmakajjecpg@kiabws.online> ---
> Hi Nghia,
>
> Very happy to hear that you work on that topic. Sounds so great, thanks for
> your hard work.
>
> As I am a Digikam User and a iphone user it would be great to have the
> following use case.
>
> 1. Each picture run to the yolo model and get assign 0, 1 or more than 1
> object
>
> 2. If an object is wrong then the user can delete or update it (by update i
> mean chosing 1 of the many existing object of the yolo model). But it
> would be
> very annoying that i have to verify if each predicted object is correct or
> not!!!
>
>
> With my iphone the use case is the following.
>
> I take a picture of something lets say a sushis.
>
> Then several days later i want to see all picture of sushi i took
> So I go to the reseach bar and type 'sushi' then i see all the picture of
> sushi.
>
> Would be great to have this feature.
>
>
> In addition would be also great to have tags of each assigned object [like
> for
> people + manual tags], so there would be a category 'objects' with its
> subcat
> 'tree' 'sushi' etc  then i could simply click on sushi to see all picture
> of
> sushis
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 8 Minh Nghia Duong 2020-09-02 06:24:57 UTC
Hi Markd,

As Trung said, for now, both the pre-trained YOLO and SSD models can recognize basic objects. In order to apply recognition on a specific set of objects, users might need to find a specific pre-trained model or to train a model that works for them.

However, we can prepare a module for general object recognition that can be compatible with YOLO and SSD. User can then install their pre-trained model with some basic configuration that adapts to their usages.

Nghia
Comment 9 caulier.gilles 2023-10-26 05:34:57 UTC
See the advancement of the student project about AI based auto-tags (mostly completed) :

https://community.kde.org/GSoc/2023/StatusReports/QuocHungTran#

Gilles Caulier
Comment 10 caulier.gilles 2023-12-01 04:31:46 UTC
Hi,

With next digiKam 8.3.0 release, the auto-tags assignment feature have been
implemented without using a cloud service. The processing is done in
core application with delegate neural network models stored in computer.

For more details about auto-tags assignment feature, look on student work
report :

https://community.kde.org/GSoc/2023/StatusReports/QuocHungTran#Add_Automatic_Tags_Assignment_Tools_and_Improve_Face_Recognition_Engine_for_digiKam

Best regards
Gilles Caulier