Bug 399596 - Tag tree keeps reverting back to incorrect format after re-adding collection
Summary: Tag tree keeps reverting back to incorrect format after re-adding collection
Status: REPORTED
Alias: None
Product: digikam
Classification: Applications
Component: Tags-Engine (show other bugs)
Version: 5.9.0
Platform: Microsoft Windows Microsoft Windows
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-10 09:32 UTC by Sebas
Modified: 2023-10-15 12:39 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
attachment-20427-0.html (5.32 KB, text/html)
2019-01-20 12:44 UTC, Steve Franks
Details
attachment-25270-0.html (2.08 KB, text/html)
2022-04-25 11:58 UTC, Steve Franks
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebas 2018-10-10 09:32:18 UTC
With https://bugs.kde.org/show_bug.cgi?id=399594 happening much lately, it caught my eye that the tag tree is being reverted to an already corrected format.

Example to make it clear:
Tree:
Cars
--Brand 1
----Model A
----Model B
--Brand 2
----Model A
--Brand 3
----Model A
----People
------Person 2
------Person 4

People
--Person 1
--Person 2
--Person 3
--Person 4
--Person 5
--Person 6

See above how all of a sudden the people subtree and some of the persons are (duplicated) inside a car brand. I have no idea how that happened in the first place. There were also pictures in the end points of that wrong tagtrees, but not as much as in the correct tag tree. So I untagged those pictures and added the right tags to them. After that I deleted the wrong tag tree ('People' inside car 'brand 3.')

The collection gets rescanned, and voila, the tree as shown above is present again, just without pictures in the wrong parts.
Comment 1 Maik Qualmann 2018-10-10 09:52:28 UTC
I think the tag metadata will be stored in the images like this. You have to start with a clean new database, preferably with digiKam-6.0.0-beta1.

Maik
Comment 2 Sebas 2018-10-10 09:58:02 UTC
After clearing all tags from the wrong tag tree from the images and applying tags from the right tree format it seems to go good with the images, but... the tag tree format from the example keeps coming back. Empty of course.

I did not try database maintenance by the way. Will do that and report back.
Comment 3 Sebas 2018-10-10 13:48:14 UTC
Database maintenance on the tags with option 'sync metadata and database' -> 'from image metadata to database' actually made it worse. The wrong tag tree was restored and probably all images with those tags are shown there and in the good tag tree. Digikam mostly shows the tags duplicated below the images while Windows Explorer doesn't show they are duplicated in the metadata.

Might try 6.x soon.
Comment 4 Sebas 2018-10-10 21:36:57 UTC
Using 6.x now.

I opened a photo in Notepad++ and found some interesting things.
Apparently Digikam saves tag trees in images apart from the tags themselves. The latter ones can not be read by software like Windows Explorer or IrfanView. Digikam tag trees are saved in custom metadata fields.

The strange thing is though...if I have moved all those images out of the wrong tree, why is this wrong tree still present in the custom metadata fields. It should have been cleared out.
Comment 5 Maik Qualmann 2018-10-11 05:59:55 UTC
I do not understand it right now, you've written the tag metadata in the images and not all have been replaced? Yes, digiKam creates a tag tree in the XMP metadata, if this is available it will be used as reference first. Otherwise, digiKam tries to extract the tags from other metadata. You can control this via the advanced metadata settings in the digiKam setup. For a complete scan, and if the digiKam tag tree is not present, the order of the images also affects how the image scanner handles the images, what the tag tree looks like after.

Maik
Comment 6 Steve Franks 2019-01-13 13:20:20 UTC
I think that I have the same problem. When I created a new database using 6.0.0-beta3 it read metadata from my images written by Microsoft Live Photo Gallery, Adobe Photoshop Elements and various other software over the years.

This has led to multiple copies of tags. e.g. Places|East Kent|Deal (Where | denotes a level in the hierarchy) can be added to the database in any of several different ways:
Places
East Kent
Deal
Places.East Kent.Deal
Places.East Kent
East Kent.Deal
and sometimes correctly (:-
I have spent several hours removing incorrect tags, merging tags and sorting out this mess.
The problem is that if Digikam re-reads the metadata from images all my hard work is undone.
It would be nice if there was a way to tell Digikam to ignore files when rereading data. I think that it's probably too difficult (dangerous?) to expect it to remove everything, but its own tags.
Comment 7 Sebas 2019-01-13 13:47:04 UTC
Seems like I forgot to answer to Maik Qualmann's last reply.

I did fix the issue manually so it is a bit hard for me to recall the whole thing, but let's try.

Digikam saves individually in image metadata (if enabled) these things:
1. Tags
2. Tagtree

I removed some tags and added new ones, or maybe renamed something. Then, after rescanning (because of some 'Collection disappearing if network share unavailable' bug) the old tagtree keeps appearing again and again.

What I found when opening these images in a text editor is that when removing tags, Digikam does not (always?) also remove that tagtree from custom metadata. It should do so when the tagtree ceases to exist. Maybe a check for this is not coded.

Steve Franks, I am not sure it is exactly the same problem. I don't know how Digikam will try to build a tagtree from tags set by other programs. Obviously those programs will not save the tagtree to custom metadata as Digikam does and if they save the tagtree to metadata at all, I have no idea if Digikam can read other tagtree metadata than its own.
Comment 8 Steve Franks 2019-01-13 22:24:23 UTC
I think Digikam is reading from .xmp files and jpegs, but not stopping when it has read a set of tags.
When I examined the Xmp file for one of my images I found data written by digikam, Lightroom, Adobe Photoshop Elements and a section headed Microsoft. These seem to use different delimiters to denote a hierarchy. Full stops are read by digikam verbatim giving keywords like Places.East Kent.Deal. Backslashes are the same, I have seen tags like Places\East Kent\Deal. | seemed to be ok, but I think sometimes digikam doesn’t parse it-I need to confirm that.
I found software that used exiftool to remove the Microsoft data, but it didn’t work. Perhaps it only removed it from either the jpg, or the Xmp, not both.
Comment 9 Steve Franks 2019-01-20 12:44:04 UTC
Created attachment 117570 [details]
attachment-20427-0.html

Digikam is correctly reading the tagtree (hierarchy) for a lot of images,
but not all.
It appears to import the tags and hierarchy from several places for an
image. e.g. .XMP sidecar, which may contain several sets of tags in
different sections and imbedded data in the image. My main problem is that
the formats of the xmp sections varies depending upon which software wrote
the data. Over the years I have used Microsoft Photo Gallery, Google
Picasa, Adobe Elements and Adobe Lightroom.
Each program has its own opinion so the same tagtree (e.g. Places|East
Kent|Deal) will appear several times, with various delimiters (i.e. | \ ,
.). Digikam seems to correctly parse Places|East Kent|Deal, but every time
it rescans files other variants reappear.
It would be nice if merging tags meant that the tags in the picture merged,
but it doesn't.
One solution would be software that would allow all not standard tags to be
deleted leaving only those written by digikam.

Changing whether digikam reads/writes sidecar files has had an unexpected
result. The number of images with no tags has increased from about 1200 to
6449.
Regards
Steve

On Sun, 13 Jan 2019 at 13:47, Sebas <bugzilla_noreply@kde.org> wrote:

> https://bugs.kde.org/show_bug.cgi?id=399596
>
> --- Comment #7 from Sebas <djsebas@home.nl> ---
> Seems like I forgot to answer to Maik Qualmann's last reply.
>
> I did fix the issue manually so it is a bit hard for me to recall the whole
> thing, but let's try.
>
> Digikam saves individually in image metadata (if enabled) these things:
> 1. Tags
> 2. Tagtree
>
> I removed some tags and added new ones, or maybe renamed something. Then,
> after
> rescanning (because of some 'Collection disappearing if network share
> unavailable' bug) the old tagtree keeps appearing again and again.
>
> What I found when opening these images in a text editor is that when
> removing
> tags, Digikam does not (always?) also remove that tagtree from custom
> metadata.
> It should do so when the tagtree ceases to exist. Maybe a check for this
> is not
> coded.
>
> Steve Franks, I am not sure it is exactly the same problem. I don't know
> how
> Digikam will try to build a tagtree from tags set by other programs.
> Obviously
> those programs will not save the tagtree to custom metadata as Digikam
> does and
> if they save the tagtree to metadata at all, I have no idea if Digikam can
> read
> other tagtree metadata than its own.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 10 caulier.gilles 2020-01-12 22:45:11 UTC
Steve,

We have updated Qt to last 5.14 and KF5 to 5.65 in the bundles.

Can you reproduce the problem with digiKam 7.0.0-beta2 pre release ?

https://files.kde.org/digikam/

Thanks in advance

Gilles Caulier
Comment 11 caulier.gilles 2020-08-01 14:24:21 UTC
digiKam 7.0.0 stable release is now published and now available as FlatPak:

https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/

We need a fresh feedback on this file using this version.

Thanks in advance

Gilles Caulier
Comment 12 Sebas 2020-08-01 19:44:59 UTC
For me there is no easy way to test if this issue still applies.
Comment 13 Steve Franks 2021-06-10 10:54:26 UTC
It still happens in the version of 7.3.0 that I installed yesterday.
I merged lots of tags, but today they are back where they were.
Sorry for the long delay in responding, I haven't used GK for a while.
Steve
Comment 14 Steve Franks 2021-06-10 11:23:48 UTC
FYI
The data is being written back to files, because my other image management app (iMatch) is reading the data. If I just correct it in iMatch, this doesn't happen.
Comment 15 Maik Qualmann 2021-06-10 14:23:04 UTC
To make it clear again. If the digiKam tags tree is not present in all images, the tag tree cannot be created completely correctly after a new scan. Without the digiKam tags tree in the images it depends on luck, which images the scanner reads first. In order to be able to restore the tags tree, the complete tags path must always be present in all images. It can't work any other way.

Maik
Comment 16 caulier.gilles 2022-01-10 06:06:06 UTC
Maik,

What's the status of this file now ?

Gilles
Comment 17 Steve Franks 2022-04-24 10:22:50 UTC
7.7.0 still reads old metadata.
I wrote tags to xmp files for all photos on my laptop, then copied those xmp file to my desktop Windows PC. Old tags reappeared. Change digikam config to only read digikam tags from Xmp. Deleted MySQL database and started from scratch, Unfortunately digikam always starts scanning before one has a chance to alter settings. 
The old keywords have reappeared, rereading metadata makes no difference.
Am I missing something? Is there a way to force digikam to read jpeg metadata only from sidecars?
Thanks in advance
Steve
Comment 18 Steve Franks 2022-04-24 10:23:25 UTC
7.7.0 still reads old metadata.
I wrote tags to xmp files for all photos on my laptop, then copied those xmp file to my desktop Windows PC. Old tags reappeared. Change digikam config to only read digikam tags from Xmp. Deleted MySQL database and started from scratch, Unfortunately digikam always starts scanning before one has a chance to alter settings. 
The old keywords have reappeared, rereading metadata makes no difference.
Am I missing something? Is there a way to force digikam to read jpeg metadata only from sidecars?
Thanks in advance
Steve
Comment 19 Maik Qualmann 2022-04-24 10:44:01 UTC
There is no reading only from sidecars. It's always a merge of image and sidecar metadata. Send me a image + sidecar and write which tags you don't want. Note that tags can be stored in different metadata. However, digiKam gives you the option to do this in the advanced metadata settings. There is no universal setting that fits all users.

Maik
Comment 20 Steve Franks 2022-04-25 11:58:41 UTC
Created attachment 148352 [details]
attachment-25270-0.html

Thank you, may I apologise for bothering you with this. I don't think this
is a bug, I believe that the main issue was that I hadn't deselected
obsolete software's tag data in advanced settings (Acdsee, Microsoft etc.).
I specified that only digikam tags should be read. Setting the option to
write to xmp only, or xmp and item, seems to be the key. If 'write to xmp
only for read only items' is chosen, I have the impression xmp sidecars are
not read for jpegs.
I deleted all keywords using tag manager and reread metadata for every
item. None of the obsolete keywords have reappeared, but I now have 60k
images without keywords. I think that the processes must have overlapped.
i.e. Deleting Keywords whilst rescanning metadata. I shall try again when I
have time. It's an old PC, which takes several hours to rescan so may
images and is currently rebuilding fingerprints to allow me to check for
duplicates.
Thanks again
Steve

On Sun, 24 Apr 2022 at 11:44, Maik Qualmann <bugzilla_noreply@kde.org>
wrote:

> https://bugs.kde.org/show_bug.cgi?id=399596
>
> --- Comment #19 from Maik Qualmann <metzpinguin@gmail.com> ---
> There is no reading only from sidecars. It's always a merge of image and
> sidecar metadata. Send me a image + sidecar and write which tags you don't
> want. Note that tags can be stored in different metadata. However, digiKam
> gives you the option to do this in the advanced metadata settings. There
> is no
> universal setting that fits all users.
>
> Maik
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 21 Steve Franks 2022-04-28 10:07:10 UTC
Sorry for the delayed response.
I was hoping to synchronise my PCs by only copying the latest images from one to the other and using XMP files to update the tags. That doesn't seem to work for JPEGs, correct me if I'm wrong but digikam doesn't seem to read xmp data from XMPs for JPEGs.
After your comment I looked again at the advanced settings and removed ACDSee, etc. from the list. I also tried to permanently remove that data from my files with Exiftool, but it is taking days.
As far as I'm concerned this isn't a bug, it's a matter of correctly configuring digikam to ignore ancient data.
Thanks for everything
Steve
Comment 22 caulier.gilles 2023-04-29 07:44:59 UTC
@Steve,

>I was hoping to synchronise my PCs by only copying the latest images from one to the other and using XMP files to update the tags. That doesn't >seem to work for JPEGs, correct me if I'm wrong but digikam doesn't seem to read xmp data from XMPs for JPEGs.

Not at all, if JPEG as XMP sidecar, the sidecar are read if option is turned on in metadata setup. 

>After your comment I looked again at the advanced settings and removed ACDSee, etc. from the list. I also tried to permanently remove that data >from my files with Exiftool, but it is taking days.
>As far as I'm concerned this isn't a bug, it's a matter of correctly configuring digikam to ignore ancient data.

In digiKam 8.0.0, we have introduced the metadata profile management in the Advanced Settings page. Look the online doc here :

https://docs.digikam.org/en/setup_application/metadata_settings.html#advanced-settings

This kind of feature can help to configure digiKam metadata engine for your use-cases.

Gilles Caulier
Comment 23 caulier.gilles 2023-10-15 12:39:19 UTC
@Steve Franks,


This problem still reproducible with the new digiKam 8.2.0 pre-release Windows
installer available at usual place:

https://files.kde.org/digikam/

This new bundle is based on last Qt framework 5.15.11 and KDE framework 5.110.

Thanks in advance

Gilles Caulier