Bug 304077 - XMP metadata interoperability problem and inconsistency
Summary: XMP metadata interoperability problem and inconsistency
Status: RESOLVED WORKSFORME
Alias: None
Product: digikam
Classification: Applications
Component: Metadata-Xmp (show other bugs)
Version: 2.2.0
Platform: openSUSE Linux
: NOR minor
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-26 09:36 UTC by Jean-François Rabasse
Modified: 2017-08-13 07:29 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In: 3.0.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jean-François Rabasse 2012-07-26 09:36:22 UTC
Problem with the way Digikam reads/writes XMP metadata and with potential interoperability with other applications.

1. A sample image (reduced size version) is available here :
   http://e-artefact.eu/scratch/DSC_1881.JPG
This image contains a title in the Dublin Core Title property (XMP section of the file)
This title is readable by command line tools,
   exiftool -g DSC_1881.JPG
   ---- XMP ----
   Title  : Rue des Francs Bourgeois
or
  exiv2 -p x DSC_1881.JPG
  Xmp.dc.Title        XmpText    24        Rue des Francs Bourgeois

2. This image is imported into a Digikam folder (and also « Reread Metadata From Image »)
Digikam correctly displays the title in the XMP Metadata infos, folder on the right, but doesn't seem to use the image title on the album thumbnails view, as could be expected.
See screenshot 1 here : http://e-artefact.eu/scratch/Screenshot-1.jpg

3. A copy of this image is done, DSC_1881_1.JPG
   (Available here : http://e-artefact.eu/scratch/DSC_1881_1.JPG )
This new image gets a new title, « Digikam title », edited from the GUI (Caption/Tags folder),
plus operations « Write Metadata to Image », « Reread Metadata From Image ».

The strange thing :
- Digikam correctly displays the new title in the album thumbnails view, as expected
- Digikam still displays the original title in the XMP Metadata folder
See screenshot 2 : http://e-artefact.eu/scratch/Screenshot-2.jpg


Investigations :
After Digikam has written an edited image title to the image file, the XMP section contains two Dublin Core Title properties. The original one named dc.Title is still present, and the Digikam written one, named dc.title with a lowercase 't'.

This doubled titling can be seen with other applications, e.g. Gwenview shows two titles,
(See screenshot 3 http://e-artefact.eu/scratch/Screenshot-3.jpg )
and exiv2 too :
  exiv2 -p x DSC_1881_1.JPG
  Xmp.dc.Title                   XmpText    24  Rue des Francs Bourgeois
  Xmp.dc.title                    LangAlt        1  lang="x-default" Digikam title
  Xmp.xmp.CreatorTool      XmpText    13  digiKam-2.2.0
  Xmp.tiff.Software            XmpText    13  digiKam-2.2.0

NB: the Gwenview metadata display, as on screenshot 3, is a bit confusing because properties labels seem to be reformated with capitalized names. The exiv2 output, however, shows clearly the case difference.


Conclusions and questions :
1. Seems that Digikam doesn't adhere to the case conventions in DC properties names, where the Title has an uppercase 'T'. As a XMP section is an XML structure and thus is case sensitive, this breaks interoperability with other applications, reading or writing title.

2. But if Digikam writes dc.title when writing metadata to image, why does it look for dc.title to get an image title for the album thumbnails view, but reads dc.Title to display metadata information (Screenshot 2)
I first thought the original title was kept in the Database and not updated across title edition, but it's not the case. Digikam database contains only the new edited title :
  sqlite3  digikam4.db
  > SELECT IM.name, IC.comment FROM images AS IM, imagecomments AS IC 
     WHERE IM.id = IC.imageid and IM.name LIKE 'DSC_1881%';
  > DSC_1881_1.JPG | Digikam title
Nothing at all about the initial title, so displayed text (Screenshot 2) does come from the image metadata section.
Comment 1 caulier.gilles 2012-07-26 16:13:43 UTC
I recommand to test with last digikam 2.7.0 and last libkexiv2 from kdegraphics component where we have fixed some code about xmp management
Comment 2 Marcel Wiesweg 2012-09-22 13:04:07 UTC
Sorry, the dc properties are all lowercase. (in contrast, tiff: properties start uppercase). Please refer to the XMP specification.
The XMP specification does not state anything about case sensitivity, so I agree with you that XMP is case sensitive as XML is. Which means the sample image contains an entry which is not defined in the dc namespace.

Which software is writing dc.Title?
Comment 3 Jean-François Rabasse 2012-09-23 18:03:57 UTC
(In reply to comment #2)
> Sorry, the dc properties are all lowercase. (in contrast, tiff: properties
> start uppercase). Please refer to the XMP specification.
> The XMP specification does not state anything about case sensitivity, so I
> agree with you that XMP is case sensitive as XML is. Which means the sample
> image contains an entry which is not defined in the dc namespace.
> 
> Which software is writing dc.Title?

Hello Marcel,

Thanks for feedback. I've also investigated and it appears the problem comes from a historical DCMI inconsistency.
As they say themselves, (cf. http://dublincore.org/documents/naming-policy/)
"In 1998, the Dublin Core element set was published using Names (at the time called Labels) which had a leading uppercase character, e.g. Title and Creator [RFC2413]. In October 2000, the DCMI Advisory Committee decided to change the Names of elements to lowercase in order to bring DCMI practice into line with conventions widely (though not universally) followed in existing Dublin Core applications and in the XML and RDF/XML communities more generally.

Unfortunately, this decision was propagated throughout DCMI documentation only after some delay [DC-ELEMENTS]. In the meantime, the Dublin Core Metadata Element Set had progressed through formal standardization channels for recognition first as NISO Z39.85-2001 and CEN Workshop Agreement CWA 13874, then as ISO 15836 -- with element Names starting in uppercase [ISO15836]."

So, they started with capitalized elements names, then changed their mind to all lowercase at a moment the ISO standard was already on its way. Too bad.

I suggest to close this issue, as it is clearly not an applications issue. I don't know, due to this formal inconsistency, how many images on the earth may embed a xmp.dc.Title instead of xmp.dc.title. 
As DCMP suggest (IMHO in a somewhat casual way) "In practice, this suggests that applications are well advised to normalize case when parsing terms for identity comparisons", I've fixed my immediate problem, in my reading program, by checking both syntaxes.
Something like :
 - get xmp.dc.title
 - if non existent or empty, get xmp.dc.Title
This solves my problem and I can live with, that way :)

Regards
Jean-François