Bug 304187 - XMP sidecar files do not write Unicode characters in the Dublin Core section
Summary: XMP sidecar files do not write Unicode characters in the Dublin Core section
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Metadata-Sidecar (show other bugs)
Version: 2.8.0
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-28 17:16 UTC by james
Modified: 2020-08-28 07:39 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 7.1.0


Attachments
Example XMP file showing non-storage of Unicode character in DC section (2.80 KB, application/xml)
2012-10-20 14:48 UTC, james
Details
XMP sidecar with correct unicode. (2.12 KB, application/x-wine-extension-xmp)
2015-04-08 19:18 UTC, Alan Pater
Details

Note You need to log in before you can comment on or make changes to this bug.
Description james 2012-07-28 17:16:42 UTC
The string "UnicoƉe" is written as "Unico?e" in the Dublin Core section of XMP sidecar files.

Reproducible: Always

Steps to Reproduce:
1. Set digikam to write metadata to XMP sidecar files.
2. Edit photo metadata to add a caption "UnicoƉe"
3. Click apply to write XMP sidecar file.

Actual Results:  
TIFF and EXIF sections correctly record the description as "UnicoƉe" but the DC section records "Unico?e".

Expected Results:  
All instances of the caption string are recorded as per the original data entry.

This may be by design for some reason that I do not know.
Comment 1 caulier.gilles 2012-07-28 18:58:11 UTC
Please try again with digiKam 2.7.0 and last libkexiv2 release published with KDE through kdegraphics component...

Gilles Caulier
Comment 2 Marcel Wiesweg 2012-09-22 13:06:06 UTC
A simple test shows that it actually works.
Comment 3 caulier.gilles 2012-09-22 14:05:29 UTC
It work for me too, using digiKam SC 3.0.0 source code.

Gilles Caulier
Comment 4 james 2012-10-20 14:48:42 UTC
Created attachment 74676 [details]
Example XMP file showing non-storage of Unicode character in DC section

Re-tested in DIgikam 2.8.0 Using KDE Development Platform 4.9.2 on Ubuntu 12.10. Issue is still present.
Comment 5 james 2012-10-20 14:49:50 UTC
Setting status back to Unconfirmed, hope this is the right thing to do.
Comment 6 james 2012-10-20 14:52:10 UTC
Sorry, I notice this was marked as fixed in 3.0.0; I only have 2.8.0. Was there actually a bug that was fixed, or are you saying that you can't reproduce the issue?
Comment 7 caulier.gilles 2012-10-20 14:58:44 UTC
yes, it is, but as it's annoted before, it's work with 3.0.0, which still in beta2 for the moment.

But lead XMP management code is drived trough libkexiv2 from KDEGraphics components which is released outside digiKam with KDE project.

Also, in background Exiv2 shared lib is used to perform all in-deep metadata changes.

Considering to update both libs on your system and try again...

Gilles Caulier
Comment 8 Marcel Wiesweg 2012-10-23 19:59:10 UTC
The important part here, which I did not replay with my test, is writing to a sidecar.
It is fully reproducible using the exiv2 command line tool. I will open an upstream bug.
Comment 9 Marcel Wiesweg 2012-10-23 20:16:08 UTC
http://dev.exiv2.org/issues/863
Comment 10 Alan Pater 2015-04-08 19:18:43 UTC
Created attachment 91954 [details]
XMP sidecar with correct unicode.

The issue cannot be reproduced using digikam 4.8.0 and exiv2 0.24.

I created a new test image, added Title and Caption tags. Resulting XMP sidecar is attached.
Comment 11 caulier.gilles 2020-08-28 07:39:14 UTC
Git commit ad0ab9efeba6e2fe3bb86207a91499e4e8eb170f by Gilles Caulier.
Committed on 28/08/2020 at 05:19.
Pushed by cgilles into branch 'master'.

IPTC and Utf8 support: If a tag is string, check if global IPTC characterset is null to convert in latin1, else we expect to interpret the string as utf8.
We use std::string accessor from Exiv2 to get an Utf8 cenversion of string. If it do not work, well this problem need to be reported as UPSTREAM
to Exiv2 as pre-cenversion of string is not done in background by the library.
This patch prevent to display latin1 string with a wrong Utf8 conversion which can break some characters.
BUGS: 379581
BUGS: 379050
FIXED-IN: 7.1.0

M  +27   -3    core/libs/metadataengine/engine/metaengine_iptc.cpp

https://invent.kde.org/graphics/digikam/commit/ad0ab9efeba6e2fe3bb86207a91499e4e8eb170f