Bug 337688 - Reading/writing of keyword-tags to jpg and xmp corrupts tag hierarchy, duplicate root tag
Summary: Reading/writing of keyword-tags to jpg and xmp corrupts tag hierarchy, dupli...
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Metadata-Xmp (show other bugs)
Version: 4.2.0
Platform: openSUSE Linux
: NOR major
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-22 11:35 UTC by Christian
Modified: 2017-08-13 07:30 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.3.0


Attachments
Screenshot of corrupted tag hierarchy (160.22 KB, image/png)
2014-07-22 14:09 UTC, Christian
Details
Digikam 4.1 automatically created xmp with explicit root tag. (72.07 KB, application/zip)
2014-07-22 20:25 UTC, Christian
Details
http://buitk.at/download/digikam_35_41_samples.zip (52 bytes, text/plain)
2014-07-22 21:44 UTC, Christian
Details
http://buitk.at/download/digikam_4.1_tag_testcase.zip (55 bytes, text/plain)
2014-07-24 04:07 UTC, Christian
Details
Duplication of tags on same level - a few files can mess up whole tag tree (92.63 KB, image/jpeg)
2014-07-24 12:58 UTC, Christian
Details
Testcase for digikam 4.2 tags that cannot be removed (59 bytes, text/plain)
2014-08-18 10:45 UTC, Christian
Details
because of undeletable tags, duplication still occurs (26.22 KB, image/png)
2014-08-18 10:48 UTC, Christian
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian 2014-07-22 11:35:55 UTC
OS / Release Details:
---------------------
Digikam/Kipi Plugins Release 4.1.0-11.1 (libkexiv2-11 4.11.5-298.11)
with and without MySQl DB (mariadb 5.5.33-2.2, libmysqlclient18 5.5.33-2.2)
(OS: OpenSuse 13.2 from BuildService KDE:Extra, Windows 8.1 from Installer)

Symptoms:
------------
Tag writing and reading destroys your tag-hierarchy metadata any time you use digikam to write tags if you use a tag tree with subnodes (tag hierachy) !!
Digikam CRUD operations for tagging will duplicate tags and mess up the hierarchy, if the "writing metadata to image" option is selected.

You will end up with multiple nested levels of _Digikam_root_tag_ hierarchies with duplicate keywords (up to 8 times) in each level. A nightmare if you have millions of tagged images.
If you try to consolidate the duplicates with digikam you will cause much more damage.

Analysis:
------------
Five different behaviours (and bugs?) when writing tags:

- Create and write new tag embedded to jpg image  (iptc, xmp) -> worked well before 4.1 if only a few tags are selected and the tag was created on the same PC/DB ... after rereading all metadata this operation creates duplicate tag hierarchy with new root tag on same level after rereading metadata.

- Set and write embedded tag, that was imported from jpg metadata before -> messed up hierarchy position or duplicates entry on pre 4.1 version, now seems to work for 4.1 under a single root tag, or on top level, but after rereading metadata the hierarchy might be corrupted with duplicates as well.

- Set and write embedded tag, that was imported from xmp sidecar before -> will have two or more nested root tags until you remove all duplicate root tags from the xml file before the import (photoshop section). 
To Reproduce: write tags from a hierarchy to image, remove tags from db, import the tags again and write tags - every time an additional nested root node is added. For me this is at least a major bug.

- Set and write new tag with same name of tag somewhere else in the hierarchy (very likely you run into this, because of the bug that causes recursivley nested root tag) -> duplicates tags because another way of writing tags is used (full path) - but reading these tags will not consolidate them with same tags from other images without full name

- Set and write many tags to a jpg file (cannot be stored in IPTC section) -> Digikam will write them to xmp xml-file with two or more nested root tags, next import duplicates the hierarchy .. see above.

These situations do NOT produce consistent keyword-tags in the images (after rereading metadata !). Each situation produces duplicate tags that are shown on different levels of the tag tree, some in the same, some on top.

Background:
-------------
 I am tagging a very large image collection with digikam for a long time. After a few tests with the tag manager I decided to clean up tags ... because till 4.0 there was no way to remove a tag from an image. In 4.1 there are three ways to remove tags. Two of them do not act as CRUD operation - even if you remove the tag and wrtie metadata to the file they will not be removed. The new "remove tag from all images" functions causes massive damage on the tag hierarchy in the images - so there is no option left to manage tags with digikam 4.1.

Tagging whish list: 
--------------------
 a quick-link to the digiKam bug-list from the digikam site (and devel-blog)

 no digikam root tag whatsoever ... the user can create one if needed

 a single tagging mode with CRUD operations for image-metadata compatible with photoshop/lightroom. For other tag sources just provide import operations, but no write operations. These apps are toys. 

 a specific tagging mode: "database only" that will not touch any image (no additional settings, just all or nothing)

 a single, consistent way to remove tags - from images/database only, see above

 import/export of tag hierarchies to xml files in tag manager
 all CRUD operations for image-metadata have to be consistent,
   or they will to be removed from the code -> there is already an entry (whish) in the bug-list. PS/Lightroom format and xml would be great.
Comment 1 Christian 2014-07-22 14:09:41 UTC
Created attachment 87875 [details]
Screenshot of corrupted tag hierarchy

Before the last writing of metadata and re-reading metadata, the tag tree did not contain duplicates and two levels of digikam root tags. After some corrections (removing outdated tags) the tag tree looks like this. (desaster)
Comment 2 Christian 2014-07-22 14:42:15 UTC
More details on the way, digikam destroys tags:

It looks like two independent bugs are involved:
- duplication of root tag (for sure when writing/reading from xmp sidecar)
- duplication of parent-keywords when editing child keywords
   (e.g.  deletion, adding new childs ...)
- duplication of added keywords in digikam 4.1.0
    Up to 8 times, in most of the cases 5 times
    In most of the cases, the duplicate keywords refer to a single image
    of the album, where image-tags have been changed before.
    Only the last duplicate entry refers to most of the images 
    it was assigned to before ... see screenshot.
    In rare cases the one above the last refers to most of the images.

Why are these tags (with exactly the same name) showed many times? The duplicates are assigned to just a few images, but there is no relation to the changes that have been made before, except that these images are always in the same album.

Note: I have a lot of duplicate images in different albums e.g. with different sizes and color profiles. But duplication of tags takes place inside a single album. Usually some images among the first 8 ones in the album view get assigned to inconsistent key word tags with exactly the same name, but with multiple duplicate entries in the tag hierarchy - see screen shot.

Hope this helps to track down the root cause of this severe bug (that will cost me at least 7 working days to restore the old image databases on all my devices - including the work of clearing the outdated tags that is lost)

More about my configuration:
- image rotation is not viewed
- all tags are written to files except "rating" and two similar items I am not using at all
- writing to xmp sidecars is selected if "image is not writeable", and sidecars are read
- on OpenSuse I migrated my digikam settings from 12.1 to 13.1, but I also tried a clean setup with same results.
- the official version of digikam in OpenSuse is still 3.5. I installed 4.1 from the KDE:Extra Repository that uses release 4.1.0-11.1 for the build. Is this a stable one?
- due to the lack of a hierarchy import/export feature, I select all keyword-tags in a specific jpg image, and I import metadata from this image first, e.g. when I set up a new linux work station for digikam to transfer the hierarchy. Before import I remove the duplicate root tags from the xml file, but I leave one root tag, otherwise the imported keywords will not match with the imported keywords from the other jpgs.

The bugs described before are not related to this workflow ... I also tried to import from jpgs without the special file with all the tags, and the result is similar (bad).
Comment 3 Christian 2014-07-22 16:26:10 UTC
Errata: 

I am using the stable OpenSuse Release 13.1 (not 13.2).

Remark: Today I replaced digiKam 4.1 with 3.5 (as recommend by Suse). This is a stable version with regards to tagging. Unfortunately there is no way to ever remove a tag from an image, but this release is not destroying your keyword tags in the images.
Comment 4 caulier.gilles 2014-07-22 18:37:00 UTC
veaceslav,

This kind of problem do not have been fixed previously with 4.0.0 release ?

Gilles Caulier
Comment 5 Veaceslav Munteanu 2014-07-22 18:58:18 UTC
I tried to reproduce this problem, added a tag with subtags on image, wrote tag to metadata, wiped all tags from database, and triggered re-read, everything works as expected...

I quick search on all digikam sources, and I don't find any _Digikam_root_tag_ string in all sources. Some legacy from old database?

I will try to test and fix everything related to tags tomorrow.
Comment 6 caulier.gilles 2014-07-22 19:02:21 UTC
Veaceslav,

yes, _Digikam_root_tag_ is a very old internal tag. I don't remember why it have been implemented. It have been dropped since a very long time.

Perhaps Marcel remember the story...

Christian, 

Can you reproduce this problem using a fresh database ?

Gilles Caulier
Comment 7 Veaceslav Munteanu 2014-07-22 19:16:50 UTC
Hmm.. I remember a user asked me on IRC channel, and suggested him to use exiv2 -pa on a tagged image to see the content.

It had indeed a _Digikam_root_tag_ in metadata, but I don't see any reasons why it creates duplicates...

Probably I should add a line or two to strip info from old _Digikam_root_tag_ ?

Cristian, can you provide me with few tagged images, so I can test?
Comment 8 Christian 2014-07-22 19:45:17 UTC
(In reply to Gilles Caulier from comment #4)
> veaceslav...

Another note on the topic:
--------------------------

Many simple cases have been fixed so far. Complex tag hierarchies never worked for me on any digikam version so far. I have not tested 4.0, only 3.5 and 4.1.

Some symptoms smell like outdated chached models ore gui states at the time when metadata is written back to the images.

An adequate testcase would be: 
..................................

Create tag tree with 8 branches, depth of 8 levels and 8 elements. 8 albums with at least 20 images nested in subfolders on 3 levels (e.g. 20 Mpx jpgs). 
Use tags below and above the digikam root tag. Create an artificial digikam root tag as well in your hierarchy.

1. Test CRUD operations - Precondition:

 - assign 2 random elements of first 3 branches to all pics, each from different levels (usecase for "location", "person", "time" category)
Assumption for these tags: they will not be changed any more, and therefore should stay unchanged during the whole test. Some images, and some complete  albums should only use unchanged tags to check if reading/writing is stable without any changes applied.

 - assign 2 random elements of branch nr 4 from all levels and assume these are outdated and have to be removed from all images later on
 - assign 2 random elements of branch nr 5 from all levels and assume these keywords have to be renamed, and some are moved to other positions in the hierarchy
 - assign 2 random elements of branch nr 6 from all levels and assume these are spacial keywords that have to be split to three different keywords later on (rename old keyword, filter on new keyword, add another keyword to some images, and remove the other one)
 - assign 2 random elements of branch nr 7 from all levels and assume these are spacial keywords that have to be joined with two or thee different keywords later on: filter on the keywords and add the one to be joined, then remove the others from all images
 - assign 2 random elements of branch nr 8 from all levels and assume these are keywords for events that will be filled up with additional keywords on all levels which will be assigned to images in several steps later on. Also remove some of the added sub-keywords from single images (will this delete them from these images?) and some from all images.

2. Test CRUD operations:

Write and re-read metadata after creation of the initial tags before the changes are applied: check before/after

Then perform all the changes in the keyword-tags as described in step 1, e.g. remove the outdated tags from branch number 4 ... . 

3. Test CRUD operations - Postcondition:

Write and re-read metadata after creation of the initial tags: check before/after

A possible method:

Make a copy of the latest tag tree - eg. store all tags in one image with all tags available (xmp xml) to document a snapshot of the tag tree in the db.
Make a screenshot of the Tag Manager View as well, because this bugs are likely related to gui/chache status (except duplicate root tag).

Then write metadata from db to all files. Remove tags in database and reload metadata from all files.

Store all tags in another image with all tags available (xmp xml) and compare the resulting tag tree, and also compare Tag Manager View with the screen shot taken before.

Write and re-read metadata a second time without any changes: check before/after

4. Try to delete a nested digikam root tag on the second level (will be created because of the mentioned bug)

Write and reread metadata again: check before/after

5. Try to delete the digikam root tag on the topmost level

Write and reread metadata again: check before/after

6. Try to delete the "My Tags" icon (I was able to do so in earlier versions)

Write and reread metadata again: check before/after

I know this a lot of work (I lost 6 days to find out) just for jpegs with embedded keywords. This test should be automated to keep the tag feature stable in future releases. 

Unfortunately I cannot invest more time at the moment, but I will if this is not working out. It is a very important feature for me. I migrated all assets from lightroom some years ago and I dont want to go back to adobe.
Comment 9 Christian 2014-07-22 19:56:53 UTC
(In reply to Veaceslav Munteanu from comment #7)
> Hmm.. I remember a user asked me on IRC channel, and suggested him to use
> exiv2 -pa on a tagged image to see the content.
> 
> It had indeed a _Digikam_root_tag_ in metadata, but I don't see any reasons
> why it creates duplicates...
> 
> Probably I should add a line or two to strip info from old
> _Digikam_root_tag_ ?
> 
> Cristian, can you provide me with few tagged images, so I can test?

Hi, I deleted all the images from 4.1 because my database has to be online till tomorrow. I installed 3.5 again.

You will not see _Digikam_root_tag_ until you import all metadata from the images to a new database or until you clear all tags in your database.

Maybe it is an issue with german language version or with tags from several relases in my images.  I could import all the tags without the _Digikam_root_tag_ if I want, but then the tags imported from the images will not be joined.

I will provide older images from digikam 3.5. Sorry they are quite big, up to 10 MB.

thank you for taking a look : )
Comment 10 Veaceslav Munteanu 2014-07-22 20:10:08 UTC
Well, this is a migration problem(3.5 -> 4.1), I guess.

I strongly need at least 2-3 images with metadata written in old format, so I could trigger your digiKam's behavior(with my images nothing happens, everything works)
Comment 11 Christian 2014-07-22 20:25:13 UTC
Created attachment 87882 [details]
Digikam 4.1  automatically created xmp with explicit root tag.

I found an older example in the backups - it was created with digikam 4.1.0-11.1
Unfortunately it is to big - I have no images below 4 MB. I added the xmp only and will prepare an ftp download with more examples.

I used this jpg to store and import all tag keywords on other workstations. The xmp file was created without asking - i guess because there are too many tags selected to be embedded.

You can see the _Digikam_root_tag_ that caused me a lot of pain in the last weeks. 

As I mentioned I could create my hierarchy without the root tag as well (I did before), but then these tags are not joined with the tags digikam reads from new images that have been tagged on another workstation.

In short:
I moved my tags below the root tag - this was a lot of work. There was no such tag before. But when I imported metadata (in 3.5) this tag was inserted by digikam - same behaviour in 4.1. For me the only way to join tagged images from several workstations was to rearrange my tags this way. Is 4.1 going nuts because of this tag? 

By the way, why does dk introduce this unwanted parent-tag when reading from an image that was not tagged with this root-tag visible in the GUI (same in 3.5 and 4.1)? There was no such tag in the xmp metadata - I checked that - but after importing metadata this root tag is shown in the GUI.
Comment 12 Christian 2014-07-22 20:38:42 UTC
(In reply to Veaceslav Munteanu from comment #10)
> Well, this is a migration problem(3.5 -> 4.1), I guess.
> 
> I strongly need at least 2-3 images with metadata written in old format, so
> I could trigger your digiKam's behavior(with my images nothing happens,
> everything works)

Ok, I will create an archive for ftp download - my pics are bigger than 4MB.
Did you check a more complex hierarchy with serveral writing and reading cycles?

I do not expect that you will not find anything special in the files. Some are tagged with a list of unstructured keywords, some with full path (if I try to correct the position in the tree), and there is no root tag until I moved all tags below this root tags on my own - see comment in the second attachement.

The database and the digikam settings where removed and created from the scratch before I started to work with 4.1. All metadata was written to files and reread several times since migration (6 hours on 8 core machine).
Comment 13 Christian 2014-07-22 21:44:01 UTC
Created attachment 87885 [details]
http://buitk.at/download/digikam_35_41_samples.zip

Please download the sample images from:
http://buitk.at/download/digikam_35_41_samples.zip

These images contained duplicate tags after I tried to add "StadtSchleining" and "Heiligenbrunn" tags below:
_Digikam_root_tag_/Orte/Oesterreich/Burgenland/  to mark these images with a new location.

"StadtSchleining" was duplicated 2 times below "Burgenland" and 2 times on the top level. A single one of the included images was assigned to the duplicate tags, while all other pics of the album remained tagged with the right keyword on the right position. Why always just a few images? Maybe they where selected in the GUI while I worked in the tag manager? I cant tell.
Tag "Heiligenbrunn" was duplicated 3 times under "Burgenland" - all tags assigned to image DSC0448 in the second album, but no duplicate tags in the first album.

I tried to remove the duplicated tags with the new tag manager in digikam - after rereading the metadata from all images StadtSchleining was duplicated 5 times below "Burgenland", two times in a nested root tag below "Burgenland", and no more on the toplevel (where other tags from other branches showed up) - see first screenshot. "Heiligenbunn" was 4 times below "Burgenland" and also several times below a second nested root tag.

You can import the old tag hierarchy from the image in album "all_tag_categories_old_with_xmp". I lost the latest state of the hierarchy. It was redesigned with hundreds of changes till a crash forced me to reread metadata with the documented results.

Note: you have to delete all tags from database and import all metadata again to see these effects. everything looks fine as long you do not import metadata.
Comment 14 Christian 2014-07-22 22:13:26 UTC
(In reply to Gilles Caulier from comment #6)
> Veaceslav,
> 
> yes, _Digikam_root_tag_ is a very old internal tag. I don't remember why it
> have been implemented. It have been dropped since a very long time.
> 
> Perhaps Marcel remember the story...
> 
> Christian, 
> 
> Can you reproduce this problem using a fresh database ?
> 
> Gilles Caulier

...

> Can you reproduce this problem using a fresh database ?

Yes, the db crashed and was rebuild three times (after all tags disappeared).
I also removed all tags with the new tag manager and read all meta data two times, but the duplication happended again. Is this "fresh" enough?

I have tagged faces as well, so cleaning the db in production is a drawback. I therefore used the "write face tags to image" option before I deleted and rebuild all tags. But I never tried with a completely new db. This will take 4 hours, to be set up but if you think it might help I will try.

Will check this if there is time to upgrade an old laptop to OpenSuse 13.1 - a prerequisite to install digikam 4.1 again. I will be busy with the downgrade to digikam 3.5 in production environment till next week.
Comment 15 Christian 2014-07-24 04:04:05 UTC
-----------------------------------------------------------------
Testcase explains why tag hierarchy is getting corrupt quickly:
     http://buitk.at/download/digikam_4.1_tag_testcase.zip
-----------------------------------------------------------------

I suggest to set the bug to "grave" after building a test case to reproduce the corruption of a small tag hierarchy - see below. Three sources of corruption eat up your tags and limit the usage of digikam 4.1 to a single device with single database that should never break. Do not leave this path until this family of bugs is fixed.

On top of my wishlist:
-------------------------
Please help me with my inconsistent tags in thousands of images tagged with different releases of digikam caused by these bugs and older ones. A simple tool could help me and many others out of this nightmare.

Requirements: This tool should read all tags that make sense from each file and copy them to all sections (XMP/IPTC ...) in a consistent way without any root tags and without any duplications or imports to the database. Remove all unreadable stuff. The database should be rebuild from scratch after the consolidation is complete.

----------------------
Testcase Explanation:
----------------------

I used a "clean" install, a new empty mysql database and OpenSuse 13.1 with digikam 4.1.0-11.8 and KDE 4.13.3 to test keyword tagging. I am convinced that SQLLite shows similar results, but I had no time to check this.

1. Copy test images to the folder with your collection

Download the zip file and unpack it to a local folder:
  http://buitk.at/download/digikam_4.1_tag_testcase.zip
Copy the sample files into your image folder, but do not copy the screenshots.

2. Subfolder: 0_inconsistent_writing_reading/album1_with_single_root_tags

bug i:
There is a bug in "Read metadata" that duplicates hierarchies when IPTC and XMP Section contain different keywords, or when "full path keywords" are mixed with single keywords. Select all images in album1 and "Read metadata from images"

bug ii:
There is another bug in "Read metadata" - whenever tags are found, that are not in the hierarchy, a "_Digikam_root_tag_" is created in the GUI, that was not in the images taglist. This is also done when there is already such a tag - in same cases they are shown beside on the same level, most of the time nested under each other. 

To check this remove all tags from the database using "Tag Manager" and "Read metadata from image" of any tagged image. If you try to delete this tag you loose all others too. The same happens if you add a tagged image from another PC with keywords that are not already in your tag tree. 

Note: The automatic update might behave in another way - but the "Read metadata" function will always create such unwanted tags. 

bug iii:
Beware - duplicated tag-branches in digikam 4.1 GUI are not always different tags until you use "write metadata". If you close digikam and open it again, some of the duplicate tags will disapear, others remain. This is another anoying bug: GUI view and internal model are out of sync. Many times this causes a loss of all tags - e.g. if you delete a nested duplicate tag, that is internally identical with the root tag above - so you delete the topmost root tag and with it all your tags are gone!

Even worse - you will not see the loss until you close and open digikam again - so any tagging operation with write operations writes chaos tags or nothing to your files.

See "digikam_4.1_remove_one_duplicate_tag_branch_before_close.jpg"
    "digikam_4.3_remove_one_duplicate_tag_branch_after_opening.jpg"
    for demonstration


3. Subfolder: 0_inconsistent_writing_reading/album2_with_duplicate_root_tags

DSC0448_with_all_duplicate_root_tags_after_writing.JPG

... demonstrates the mentioned bugs - I found this one in my collection that was edited on two different PCs. Some of the tags have duplicate root tags on top because the tag tree was imported on another PC from the image and later on more tags where added ... this caused the corrupt tags in the GUI to be written to the file.

DSC0448_with_few_duplicate_root_tags_before_writing.JPG
... demonstrates what happens when this metadata is read and written again - the root tag was not duplicated this time, but the ones without root tag have one now. Please not that this is undesired behaviour - why are toplevel tags added to some tags and not to others? Unpredictable behavior ..

4. Subfolder: 0_inconsistent_.../album3_with_no_dk_root_tag_and_duplication

album3 demonstrates the bugs described before on a single file, starting with an empty tag tree. First two nested tags are added to the file. Then two more tags are added on the third level. Finally two of the topmost tags are removed again. The removal of these tags now works without any explicit writing of metadata ... this is an advantage compared to older versions : )

Everything worked out well - also writing of metadata and reading metadata again does not cause corruption. The bugs i-iii described above do not apply, because the hierarchy was created on this PC and is still present!

Corruption starts once you remove some of these tags before reading metadata again, or if you copy the tagged files to another PC and "read metadata".
This explains why these severe bugs have not been fixed for such along time.

See: "_DSC2638_5_unwanted_digikam_root_tag_written_to_file_from_gui.JPG"

See also this example to understand creation of nested duplicate root tags:  "_DSC2638_7_root_tag_is_duplicated_after_2nd_read_when_no_tag_present_and_write"


4. Subfolder: 1a_move_tag_to_new_position_in_tree_by_moving

This example demonstrates inconsistent IPTC and XMP tags that cause a bad mess when reading metadata from such a file, because no tag tree will match these cases - so many duplicate branches are created (I hope this is not excpected?)

It shows what happens if one tries to move a tag from top down to a subbranch and writes metadata to all related files again (not needed if tags stored as single keywords- but who knows in which way tags are stored in a particular image?) In this case writing metadata works well.

But rereading metadata from this file really surprised me - a Person tag shows up, why now I cant tell - and even worse: the "Zeit" tag was removed to top (why causes reading a move of a tag?) - and some hidden, old "Zeit" Versions appeared, that where not visible before, when we read metadata.

bug iv:  In case of inconsistent IPTC keywords that do not match XMP keywords, reading metadata will not show all kewords. After some other operations reading again will bring new keywords (hidden in the file). In this case the position of existing keywords is changed as well while reading - this is unwanted.
This might be an issue of "full path keywords" mixed with "single keywords".

4. Subfolder: 1b_reread_with_corrected_hierarchy_duplicates

digikam_4.1_tag_tree_with_missing_br_duplicates_root_when_reading.jpg and
digikam_4.1_before_tag_tree_with_missing_branches.jpg

demonstrate the mentioned bug, that the GUI is sometimes not in sync with the internal model - branches that seem duplicates are internally not duplicated - so deleting a nested tag leads to the loss of the whole branch.
This can be avoided, if you close an open digikam each time a duplication occurs - to see if it is real or just fake.

5. Subfolder: 
    2a_inconsistent_readingwriting_of_metadata_duplicates_tags
    2b_inconsistent_reading_with_missing_tags

 Several of methods to write tags have been applied to these images, and the "Read metadata from images" functions causes a big mess of duplicate tags.
 There is also an example of a tagged file that contains no root tag - but if it resides in an album with an image with an root tag, it will get one the next time the metadata of this album is writen to all files- 

bug v: unwanted root tags infect other images with root tags if they are changed together.

6. Subfolder: 3_write_from_duplicated_hierarchy_to_file

Digikam used (hopefully this does not continue) a mixed strategy to write tags - sometimes with full path, sometimes not. This causes strange semantics for duplicate keywords. 

Wish / bug vi:
To avoid chaos: always write keywords with full path, and NEVER write identical keywords or the same path more than once.
I am not sure if digikam 4.1 meets this requirement.

In this example the same keyword is used on four different positions in the tag tree (because of other bugs some branches have been duplcated), and all four have been selected and written to the file. If the GUI view was out of sync, some of them stand for the same position - but they have been written four times. I cant tell if resulting metadata is as expected.

7. Subfolder: 4_remove_inconsistent_tag_close_open

Another example that GUI view and internal model are sometimes out of sync. After accidentially deleting the top most tag (because it was shown two times) the tree looks well - after closing and opening the whole branch is gone.
Comment 16 Christian 2014-07-24 04:07:21 UTC
Created attachment 87925 [details]
http://buitk.at/download/digikam_4.1_tag_testcase.zip

Download and unzip testcase to reproduce six kinds of bugs around tagging
Comment 17 Veaceslav Munteanu 2014-07-24 07:43:54 UTC
that host is so slow, it takes me 8 hours to download it. Use google drive or dropbox for faster speeds.
Comment 18 Veaceslav Munteanu 2014-07-24 08:00:58 UTC
Somehow, I was able to download the attachment, and I'm able to reproduce what you said. Fixing now, please wait :)
Comment 19 Veaceslav Munteanu 2014-07-24 08:38:15 UTC
Git commit 86d06f51a3d391fd243ad82983e532e12171b6b5 by Veaceslav Munteanu.
Committed on 24/07/2014 at 08:37.
Pushed by munteanu into branch 'master'.

M  +10   -2    libs/database/imagescanner.cpp

http://commits.kde.org/digikam/86d06f51a3d391fd243ad82983e532e12171b6b5
Comment 20 Veaceslav Munteanu 2014-07-24 08:40:17 UTC
Still fixing, digiKam still do not overwrite the old format...
Comment 21 Christian 2014-07-24 10:31:19 UTC
(In reply to Veaceslav Munteanu from comment #17)
> that host is so slow, it takes me 8 hours to download it. Use google drive
> or dropbox for faster speeds.

thank you : )

Sorry, a download within 2 minutes is available again after rebooting my server. There is a degradation of performance after several months uptime of my apache 2.2. I found no time to find out what causes this - no hints in my logs so far.
Comment 22 Veaceslav Munteanu 2014-07-24 11:27:49 UTC
https://www.dropbox.com/s/axtmrkmu27nkyxi/tags_clean.png

This is tag tree after importing your mega-pack :D

I guess it's pretty clean. Also I have no idea how could you make two _Digikam_root_tag_ on the same level, it is almost impossible, there are duplicate checks everywhere...

Also, now digiKam is able to clean-up your metadata when you write it back, so your images have brand new, clean metadata in them.

Still need to check few of your test cases, but... after lunch :)
Comment 23 Christian 2014-07-24 12:55:28 UTC
(In reply to Veaceslav Munteanu from comment #22)
> https://www.dropbox.com/s/axtmrkmu27nkyxi/tags_clean.png
> 
> This is tag tree after importing your mega-pack :D
> 
> I guess it's pretty clean. Also I have no idea how could you make two
> _Digikam_root_tag_ on the same level, it is almost impossible, there are
> duplicate checks everywhere...

Wow - that would be great. So I reread metadata from all images and write metadata to all images using tag manager to clean up? 

Question:
Will your fixes apply 
-to tag manager tools only, or only
-to explicit calls of "read metadata" / "write metadata" for files and albums, -or to both ?
How about the inital reading of a collection? Same code?

> _Digikam_root_tag_ on the same level

This happended only once with the root tag, when I read metadata from three files at once, two with inconsistent IPTC and XMP into a messed up tree. Maybe a display error.

Duplication of other tags on the same level is very common - see eg my latest attachment. I copied 5 files of the testcase to my big collection and used tag manager to read metadata of all images to find out if the same bugs apply. They do.
Comment 24 Christian 2014-07-24 12:58:38 UTC
Created attachment 87932 [details]
Duplication of tags on same level - a few files can mess up whole tag tree

Tag duplication on the same level in big collection with 5 inconsistent files:

I copied 5 files of the testcase to my big collection and used tag manager to read metadata of all images to find out if the same bugs apply. They do.
Comment 25 Veaceslav Munteanu 2014-07-24 13:08:33 UTC
Git commit 992a219433264daa00c77f3c6ad27a92705d6900 by Veaceslav Munteanu.
Committed on 24/07/2014 at 13:08.
Pushed by munteanu into branch 'master'.

M  +65   -74   digikam/fileaction/metadatahub.cpp
M  +9    -0    digikam/fileaction/metadatahub.h

http://commits.kde.org/digikam/992a219433264daa00c77f3c6ad27a92705d6900
Comment 26 Veaceslav Munteanu 2014-07-24 13:21:13 UTC
The original problems are from _Digikam_root_tag_, which I added extra checks.

every occurrence of it will be deleted, so it doesn't matter if it is one _Digikam_root_tag_ or 5 of them. 

The nested root tags I discovered in your metadata, some images contained:

_Digikam_root_tag_/_Digikam_root_tag_, probably old bugs...

The check I applied to both parts where digiKam read tags and where digiKam write tags, clean-up can be done using any metadata writing option:

1. Write metadata
2. Maintenance tool
3. Tags Manager sync export options

Please note: if your tags database is empty and you trigger write, all metadata from images will be cleared. Do not forget to read them before writing.

Also, about duplicate tags on the same level, digiKam do not allow me to make them, even re-read do not reveal anything suspicious.

Only a corrupt database can contain this, use new one(when testing my fixes).

Also I tested the tag move to different sub-tree and it works, all tags are correctly written and read.
Comment 27 Christian 2014-07-24 14:18:27 UTC
(In reply to Veaceslav Munteanu from comment #26)

Wow, this was really quick : ) 
Thank you for the quick response and the time invested.

Yes, I expect that all tags are removed if nothing was read before.
Remark: I used a completely empty mysql database for the test. The three root tags showed up on the second level - the position were the gui often got out of sync. They might have been fakes disappearing after closing and opening. I do not remember.

How do I get the fix?

I am from the Java side, so I don't know exactly how I can build something from the updated master branch that runs in my distro (OpenSuse) with mysql. 

Do you recommend the procedere on https://www.digikam.org/download/GIT ?
Or should I contact guys from the suse factory? Easier ways like a fix?
I guess I have to install the dependency packages for kipi and digikam?

Christian
Comment 28 Veaceslav Munteanu 2014-07-24 14:34:40 UTC
Git commit 5cc7125ba8ec452b9d4f95687f35e7071bbd9b55 by Veaceslav Munteanu.
Committed on 24/07/2014 at 14:34.
Pushed by munteanu into branch 'master'.

M  +25   -16   digikam/fileaction/metadatahub.cpp
M  +7    -0    digikam/fileaction/metadatahub.h
M  +3    -2    libs/database/imagescanner.cpp

http://commits.kde.org/digikam/5cc7125ba8ec452b9d4f95687f35e7071bbd9b55
Comment 29 Veaceslav Munteanu 2014-07-24 14:50:31 UTC
I have no experience with packaging and the fix is only available in git repository. You can try and build digiKam from git sources or to wait until digiKam 4.2 will be available ( Release date: 2014-08-31)
Comment 30 Christian 2014-08-18 10:41:32 UTC
Reason why I want to reopen this bug: The last fix improved tagging a lot, but there is still an issue related to "zombie tags" that cannot be deleted in  digikam 4.2.

----------------------------------------------------------------------------

This testcase demonstrates a bug (or several bugs) related to "zombie tags", that cannot be removed with any of the tools in digikam 4.2 (linux: openSuse).
Even it looks like they are gone, they will come back after reading metadata. Some only show up again if you excplicitely "read metadata" from the files.

Download Link:
-----------------------------------
http://buitk.at/download/digikam42_zombie_tag_testcase.zip

Pre condition:
-----------------------------------
I cleaned all tags in the files and in the database with dk R4.2 for openSuse first. Tags were written two times to all files. 

Result: About 30.000 tagged images seem to be clean now, apr. 14.000 are still infected with old stuff that was not deleted as excpected.
When reading metadata a second time a lot of old tags came to live again. I removed them again, but finally some show up during writing again.
Only these cases are documented here. 

I install Digikam 4.2 with an empty sqlite database to do the tests described below.

Testcase and reproducable symptoms:
-----------------------------------

I install Digikam 4.2 with an empty sqlite database and import tags from the file "00_all_keywords_buitk.jpg".
To do so you have to activate "read metadata from sidecar files". I use "write tags to files" settings for metadata.

Then I add all images of the folders below to the collection:
  "cannot_remove_tags_from_different_subtree_with_same_leaves"
  "cannot_remove_tags_from_wrong_position"

Then I try to remove the tags that are mentioned in the name of the jpg images.
Example: File "dsc01931_remove_Zeit-bChr004_Neuzeit_and_1400Spaetmittelalter.jpg"
 ... refers to <toplevel>Zeit/bChr004_Neuzeit/...   and all subsequent tags 
 ... refers to <toplevel>1400Spaetmittelalter

Symptom: Deleted "Zombie Tags" come back
------------------------------------------

Digikam 4.2 can read and write all tagges files now. So the changes are always written to the images without any error messages.
But in many cases (14.000) the removed tags still remain somewhere in the written metadata - and not in the database. I have checked this by starting with an empty database again.
The zombie-tags will show up again if you manually select a single file to read metadata. Sometimes these zombie tags are created with several copies, if you select more than one file at a time to read metadata.

Analysis: some hints to track down "why?"
-----------------------------------------
When looking at the cases that still cause troubles in dk4.2 it becomes obvious that these zombie tags are related to tag-names that are used on more than one position in the tag tree.
There are several reasons why tag-names are used twice many times in practice:

1. They come from moving tags to another position e.g. if a wrong geographic classification is corrected, e.g. location is moved to another parent node. 
Note: I will never drag the tag to the new position. A new is created, the old is deleted.

2. Historical reasons, eg. there was an old subbranch of the time-categories: Zeit/bChr004_Neuzeit/* with the same leaves like the new one Zeit/bChr006_Neuzeit/*
Since 2009 I try to get rid of the old Zeit/bChr004_Neuzeit/ branch - with no luck till today.

3. Many orphan tag nodes come from the "Digikam root tag" bug in older versions, that caused the loss of one or more levels of nesting.

My guess is that there is a bug in writing, when a tag with the same tag-name should be removed from one position, but still has to remain in another position.
NOTE: If you remove all tags manually, all tags will be gone, also the zombies. These zombies only persist if there is at least another tag around !

Final Note: I also deleted a lot of files with bad tags to get rid of the mess. Important ones were cleaned using Gimp. But I cannot clean up 14.000 images this way.
Comment 31 Christian 2014-08-18 10:45:30 UTC
Created attachment 88295 [details]
Testcase for digikam 4.2 tags that cannot be removed

http://buitk.at/download/digikam42_zombie_tag_testcase.zip
Comment 32 Christian 2014-08-18 10:48:53 UTC
Created attachment 88296 [details]
because of undeletable tags, duplication still occurs

The bug described in the testcase might also cause duplication of tags when selecting more than one file to reread metadata or when reading/writing all images. It does not occur, if single files are selected to read metadata one after another.
Comment 33 Veaceslav Munteanu 2014-08-19 09:50:19 UTC
:( Yes, I can reproduce some problems, such as can't remove the same tag from different sub-trees... I'm a little busy now to finish the work on other part of digiKam, might take some time...
Comment 34 caulier.gilles 2014-08-28 13:31:33 UTC
Veaceslav,

Any progress here before 4.3.0 release ?

Gilles
Comment 35 Veaceslav Munteanu 2014-09-01 17:57:17 UTC
Git commit 2719c59891e2a72a15e47f539c26b2721e049a24 by Veaceslav Munteanu.
Committed on 01/09/2014 at 17:52.
Pushed by munteanu into branch 'development/balooport'.

M  +0    -1    digikam/fileaction/metadatahub.cpp
M  +5    -1    utilities/baloo/baloowrap.cpp

http://commits.kde.org/digikam/2719c59891e2a72a15e47f539c26b2721e049a24
Comment 36 Veaceslav Munteanu 2014-09-01 17:58:44 UTC
Still working... :D
Comment 37 Veaceslav Munteanu 2014-09-02 10:39:57 UTC
*** Bug 309243 has been marked as a duplicate of this bug. ***
Comment 38 Veaceslav Munteanu 2014-09-02 10:57:46 UTC
I guess, as usual, it was one line what messed up everything.

Thank you Christian for your extensive Test Cases, I just tested everything in the first package and in "zombie package".

The only thing that I couldn't understand and reproduce was the duplicate jpg thing... from the 3rd folder...

Please test if you can, and reopen if you find something more...