Bug 132244 - Special Chars in Keywords decode wrong in IPTC
Summary: Special Chars in Keywords decode wrong in IPTC
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Metadata-Iptc (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-11 09:00 UTC by Johann-Nikolaus Andreae
Modified: 2020-08-28 07:38 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In: 7.1.0
Sentry Crash Report:


Attachments
screenshot metadata sidebar (12.26 KB, image/png)
2006-12-02 22:42 UTC, Caspar Maessen
Details
JPG image with Hebrew IPTC info. In all fields is the word שלום (3.31 KB, image/jpeg)
2007-01-17 20:51 UTC, Dotan Cohen
Details
JPG image with English IPTC data. (3.30 KB, image/jpeg)
2007-01-17 20:53 UTC, Dotan Cohen
Details
IPTC and UTF8 from BrillantPhoto displayed in digiKam and Photoshop (241.93 KB, image/png)
2007-01-22 08:04 UTC, caulier.gilles
Details
IPTC Encoding patch for libkexiv2 (1.85 KB, patch)
2007-02-16 22:16 UTC, Leonid Zeitlin
Details
IPTC Encoding patch for digikam (7.04 KB, patch)
2007-02-16 22:16 UTC, Leonid Zeitlin
Details
IPTC tag in hebrew (iso-8859-8) from PhotoStation (34.19 KB, image/jpeg)
2007-08-03 11:11 UTC, Nadav Kavalerchik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Johann-Nikolaus Andreae 2006-08-11 09:00:57 UTC
Version:           0.9.0-beta1 (using KDE 3.5.4 Level "a" , unofficial build of SUSE )
Compiler:          Target: i586-suse-linux
OS:                Linux (i686) release 2.6.16.21-0.13-default

If i view the IPTC keywords in the IPTC-tab the german spacial word are decode wrong. For the other IPTC-field i have it not testet.
The keywords are insert by the tag function from digikam.
Comment 1 caulier.gilles 2006-08-17 00:26:00 UTC
This is not a problem with digiKam. In fact IPTC metadata is limited to ASCII charactors !

This problem will be fixed when Exiv2 library will support XMP metadata witch support UTF8.

Gilles Caulier
Comment 2 caulier.gilles 2006-10-04 10:12:46 UTC
SVN commit 592268 by cgilles:

digikam from trunk : strings from Exiv2 to render metadata content are ascii, not local 8 bits formated. If we use a linux dist using UTF8 encoding (like Suse 10.1 for ex.), some characters can be wrongly decoded.

CCBUGS: 132244

 M  +7 -7      exifwidget.cpp  
 M  +7 -7      gpswidget.cpp  
 M  +7 -7      iptcwidget.cpp  
 M  +7 -7      makernotewidget.cpp  


--- trunk/extragear/graphics/digikam/libs/widgets/metadata/exifwidget.cpp #592267:592268
@@ -149,7 +149,7 @@
 
         for (Exiv2::ExifData::iterator md = exifData.begin(); md != exifData.end(); ++md)
         {
-            QString key = QString::fromLocal8Bit(md->key().c_str());
+            QString key = QString::fromAscii(md->key().c_str());
 
             // Decode the tag value with a user friendly output.
             QString tagValue;
@@ -161,7 +161,7 @@
             {
                 std::ostringstream os;
                 os << *md;
-                tagValue = QString::fromLocal8Bit(os.str().c_str());
+                tagValue = QString::fromAscii(os.str().c_str());
             }
             tagValue.replace("\n", " ");
 
@@ -178,7 +178,7 @@
     catch (Exiv2::Error& e)
     {
         kdDebug() << "Cannot parse EXIF metadata using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return false;
     }
@@ -203,12 +203,12 @@
     {
         std::string exifkey(key.ascii());
         Exiv2::ExifKey ek(exifkey); 
-        return QString::fromLocal8Bit( Exiv2::ExifTags::tagTitle(ek.tag(), ek.ifdId()) );
+        return QString::fromAscii( Exiv2::ExifTags::tagTitle(ek.tag(), ek.ifdId()) );
     }
     catch (Exiv2::Error& e) 
     {
         kdDebug() << "Cannot get metadata tag title using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("Unknow");
     }
@@ -220,12 +220,12 @@
     {
         std::string exifkey(key.ascii());
         Exiv2::ExifKey ek(exifkey); 
-        return QString::fromLocal8Bit( Exiv2::ExifTags::tagDesc(ek.tag(), ek.ifdId()) );
+        return QString::fromAscii( Exiv2::ExifTags::tagDesc(ek.tag(), ek.ifdId()) );
     }
     catch (Exiv2::Error& e) 
     {
         kdDebug() << "Cannot get metadata tag description using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("No description available");
     }
--- trunk/extragear/graphics/digikam/libs/widgets/metadata/gpswidget.cpp #592267:592268
@@ -275,12 +275,12 @@
 
         for (Exiv2::ExifData::iterator md = exifData.begin(); md != exifData.end(); ++md)
         {
-            QString key = QString::fromLocal8Bit(md->key().c_str());
+            QString key = QString::fromAscii(md->key().c_str());
 
             // Decode the tag value with a user friendly output.
             std::ostringstream os;
             os << *md;
-            QString tagValue = QString::fromLocal8Bit(os.str().c_str());
+            QString tagValue = QString::fromAscii(os.str().c_str());
             
             // We apply a filter to get only standard Exif tags, not maker notes.
             if (d->keysFilter.contains(key.section(".", 1, 1)))
@@ -309,7 +309,7 @@
         d->detailsButton->setEnabled(false);
         d->detailsCombo->setEnabled(false);
         kdDebug() << "Cannot parse EXIF metadata using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return false;
     }
@@ -334,12 +334,12 @@
     {
         std::string exifkey(key.ascii());
         Exiv2::ExifKey ek(exifkey); 
-        return QString::fromLocal8Bit( Exiv2::ExifTags::tagTitle(ek.tag(), ek.ifdId()) );
+        return QString::fromAscii( Exiv2::ExifTags::tagTitle(ek.tag(), ek.ifdId()) );
     }
     catch (Exiv2::Error& e) 
     {
         kdDebug() << "Cannot get metadata tag title using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("Unknow");
     }
@@ -351,12 +351,12 @@
     {
         std::string exifkey(key.ascii());
         Exiv2::ExifKey ek(exifkey); 
-        return QString::fromLocal8Bit( Exiv2::ExifTags::tagDesc(ek.tag(), ek.ifdId()) );
+        return QString::fromAscii( Exiv2::ExifTags::tagDesc(ek.tag(), ek.ifdId()) );
     }
     catch (Exiv2::Error& e) 
     {   
         kdDebug() << "Cannot get metadata tag description using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("No description available");
     }
--- trunk/extragear/graphics/digikam/libs/widgets/metadata/iptcwidget.cpp #592267:592268
@@ -126,12 +126,12 @@
 
         for (Exiv2::IptcData::iterator md = iptcData.begin(); md != iptcData.end(); ++md)
         {
-            QString key = QString::fromLocal8Bit(md->key().c_str());
+            QString key = QString::fromAscii(md->key().c_str());
             
             // Decode the tag value with a user friendly output.
             std::ostringstream os;
             os << *md;
-            QString value = QString::fromLocal8Bit(os.str().c_str());
+            QString value = QString::fromAscii(os.str().c_str());
             // To make a string just on one line.
             value.replace("\n", " ");
 
@@ -157,7 +157,7 @@
     catch (Exiv2::Error& e)
     {
         kdDebug() << "Cannot parse IPTC metadata using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return false;
     }
@@ -181,12 +181,12 @@
     {
         std::string iptckey(key.ascii());
         Exiv2::IptcKey ik(iptckey); 
-        return QString::fromLocal8Bit( Exiv2::IptcDataSets::dataSetTitle(ik.tag(), ik.record()) );
+        return QString::fromAscii( Exiv2::IptcDataSets::dataSetTitle(ik.tag(), ik.record()) );
     }
     catch (Exiv2::Error& e) 
     {
         kdDebug() << "Cannot get metadata tag title using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("Unknow");
     }
@@ -198,12 +198,12 @@
     {
         std::string iptckey(key.ascii());
         Exiv2::IptcKey ik(iptckey); 
-        return QString::fromLocal8Bit( Exiv2::IptcDataSets::dataSetDesc(ik.tag(), ik.record()) );
+        return QString::fromAscii( Exiv2::IptcDataSets::dataSetDesc(ik.tag(), ik.record()) );
     }
     catch (Exiv2::Error& e) 
     {
         kdDebug() << "Cannot get metadata tag description using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("No description available");
     }
--- trunk/extragear/graphics/digikam/libs/widgets/metadata/makernotewidget.cpp #592267:592268
@@ -175,12 +175,12 @@
 
         for (Exiv2::ExifData::iterator md = exifData.begin(); md != exifData.end(); ++md)
         {
-            QString key = QString::fromLocal8Bit(md->key().c_str());
+            QString key = QString::fromAscii(md->key().c_str());
             
             // Decode the tag value with a user friendly output.
             std::ostringstream os;
             os << *md;
-            QString value = QString::fromLocal8Bit(os.str().c_str());
+            QString value = QString::fromAscii(os.str().c_str());
             value.replace("\n", " ");
 
             // We apply a filter to get only standard Exif tags, not maker notes.
@@ -196,7 +196,7 @@
     catch (Exiv2::Error& e)
     {
         kdDebug() << "Cannot parse MAKERNOTE metadata using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return false;
     }
@@ -220,12 +220,12 @@
     {
         std::string exifkey(key.ascii());
         Exiv2::ExifKey ek(exifkey); 
-        return QString::fromLocal8Bit( Exiv2::ExifTags::tagTitle(ek.tag(), ek.ifdId()) );
+        return QString::fromAscii( Exiv2::ExifTags::tagTitle(ek.tag(), ek.ifdId()) );
     }
     catch (Exiv2::Error& e) 
     {
         kdDebug() << "Cannot get metadata tag title using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("Unknow");
     }
@@ -237,12 +237,12 @@
     {
         std::string exifkey(key.ascii());
         Exiv2::ExifKey ek(exifkey); 
-        return QString::fromLocal8Bit( Exiv2::ExifTags::tagDesc(ek.tag(), ek.ifdId()) );
+        return QString::fromAscii( Exiv2::ExifTags::tagDesc(ek.tag(), ek.ifdId()) );
     }
     catch (Exiv2::Error& e) 
     {
         kdDebug() << "Cannot get metadata tag description using Exiv2 ("
-                  << QString::fromLocal8Bit(e.what().c_str())
+                  << QString::fromAscii(e.what().c_str())
                   << ")" << endl;
         return i18n("No description available");
     }
Comment 3 caulier.gilles 2006-10-04 10:17:37 UTC
Johann, please checkout current implementation from svn (not 0.9.0-beta2), and let's me hear is this commit have solved your problem.

Note: my comments #1 still right. UTF8 is not supported by IPTC. If an application try to embed UFT8 string in an IPTC tags, well the IPTC specification is not respected. Look here:

http://www.iptc.org/std/IIM/4.1/specification/IIMV4.1.pdf

The alternative is to use XMP metadata instead.

Gilles Caulier
Comment 4 Loïc Brarda 2006-10-04 13:24:27 UTC
2006/10/4, Gilles Caulier <caulier.gilles@free.fr>:

> Note: my comments #1 still right. UTF8 is not supported by IPTC. If an application try to embed UFT8 string in an IPTC tags, well the IPTC specification is not respected. Look here:
>
> http://www.iptc.org/std/IIM/4.1/specification/IIMV4.1.pdf
>


For me, it's not that clear in the specification.

The character set can be defined in the envelop record (dataset 1:90)
which is normaly not used (as I understand the specs, the whole spec
was made to encapsulate picture in IIMV file, not encapsulate IOTC
infos in picture files).

Other specification sections let me think UTF8 is possible :

"Section 1.12 DataSet octet sizes do not imply character sizing. The number of
characters will depend on the encoding method specified. The number of octets
specified within a DataSet Data Field Octet Count will always be equal
to or greater
than the number of characters of data represented."

There is also the definition of UTF8 in Section 1.75.

The more standard way should probably be using a record 1 with a 1:90
dataset to define UTF8 but I think most programs just use UTF8
directly in the text fields.

After  some googling, I found the following page
(http://bugs.php.net/bug.php?id=27238) with links with files with
Record1 charset info but unfortunatly, the links are broken.
I found also some links with IPTC software showing their UTF8 support.

I'll try to do some tests with differents IPTC writing software.

   Loic
Comment 5 caulier.gilles 2006-10-04 13:30:35 UTC
Thanks for this report Loic.

Andreas, you have a better experience with IPTC than me. Can you confirm that we can use UTF8 encoding in IPTC text tags using Exiv2 library ?

Thanks in advance

Gilles
Comment 6 Caspar Maessen 2006-12-02 22:42:33 UTC
Created attachment 18756 [details]
screenshot metadata sidebar

As I understand above discussion, the screenshot I added is all about that
problem. I noticed this behaviour before, and decided to change the
copyrightnotice into (C)... But in the IPTC-documents I read that in Europe it
is probably best for juridical reasons to use the copyrightsign. So I changed
it back for all my fotographs with the use of the exiv2 commandline-tool.
Within digiKam this leads to the accompanying result. What I mean to say is
that apparantly the IPTC (needs to) accept(s) this kind of characters.

Caspar.
Comment 7 Andreas Huggel 2006-12-03 05:53:44 UTC
Gilles,

Regarding the patch above, digikam code needs to distinguish between metadata (data stored as tag values) and text that comes from exiv2 (tag titles, descriptions, error messages, etc). Metadata is encoded according to whatever the relevant (Exif or IPTC) standard defines, possibly different for different tags (Exif user comment has its own charset setting for example). Text from exiv2 is currently in ASCII only but when we support gettext, that will change.
 
What character set do the translation files use?

-ahu.
Comment 8 Andreas Huggel 2006-12-03 06:36:03 UTC
To the question in comment #5: You can store any data in the tags, exiv2 usually doesn't care. But I don't know whether storing UTF8 encoded text in IPTC fields is ok and how it should be done to comply with the standard. Forwarded the question to the exiv2 list.

-ahu.
Comment 9 caulier.gilles 2006-12-03 08:38:35 UTC
Andreas,

Since we have implemented NLS support in Exiv2, the code patched in #2 is obsolete. In current implementation, i use QString::fromLocal8Bit() when its require. There is a digiKam screenshot with non-ascii characters (French) at this url:

http://digikam3rdparty.free.fr/Screenshots/dgikam_metadata_tags_i18n.png

Gilles
Comment 10 caulier.gilles 2006-12-12 13:57:01 UTC
Andreas,

Look this page :

http://peccatte.karefil.com/Software/Metadata.htm#IPTC

sorry it's in French...but it's very instructive. I have never seen an English page about Extended char with IPTC (UTF-8 like). Especiall, there is a section witch said :

« Le modèle IPTC-NAA permet de coder les champs selon divers jeux de caractères étendus. Les logiciels actuels devraient donc être capables de gérer correctement les accents, les signes diacritiques, etc. Il n'en est rien - si l'on utilise des caractères étendus lors de la saisie des informations dans Photoshop par exemple, ces informations ne sont pas correctement affichées sur une autre plate-forme. Adobe préconise de n'utiliser que l'ASCII 7 bits [ce qui est inacceptable pour beaucoup de langues!] parce que le standard IPTC n'autorise que ce jeu de caractères [ce qui est faux!] »

To resume, IPTC can support extended char set but because Photoshop only support ASCII 7bits char (with IPTC, not XMP), all others applications must only support this mode.

If you look into IPTC spec page 20, the tag Iptc.Envelope.CharacterSet is designed for personalize char encoding

Gilles
Comment 11 Andreas Huggel 2006-12-29 14:15:01 UTC
On Thursday 28 December 2006 23:37, Marco Piovanelli wrote:
>     yes, the IPTC standard does allow for non-ASCII
>     character sets, although it's by no means obvious how these
>     are specified.  See for instance Stefano Bettelli's excellent
>     description of JPEG metadata on CPAN for a brief discussion of
>     this:
>
><http://www.annocpan.org/~BETTELLI/Image-MetaData-JPEG-0.15/lib/Image/MetaData/JPEG/TagLists.pod>
>
>     In particular, you can safely assume IPTC strings are
> UTF-8-encoded if the "Iptc.Envelope.CharacterSet" dataset contains
> the three-byte escape sequence "\x1B%G".
Comment 12 caulier.gilles 2007-01-02 21:33:25 UTC
Thanks for the url Andreas.

I will trying to use it and check the interoperability with Photoshop...

Gilles
Comment 13 Dotan Cohen 2007-01-17 20:51:15 UTC
Created attachment 19315 [details]
JPG image with Hebrew IPTC info. In all fields is the word שלום

This image was tagged with IPTC data on BrilliantPhoto on Windows. In the
following fields is the following info:
Caption:שלום
Keywords:מפתח, שלום
People:אתי, שלום
Event:שלום
Place:שלום

The data is in UTF-8. Note that multiple Keywords and People are seperated by a
comma and a space.
Comment 14 Dotan Cohen 2007-01-17 20:53:56 UTC
Created attachment 19316 [details]
JPG image with English IPTC data.

This image was tagged with IPTC data on BrilliantPhoto on Windows. In the
following fields is the following info:
Caption:Caption
Keywords:Keyword1, Keyword2
People:Person1, Person2
Event:Event
Place:Place

The data is UTF-8. This and the previous attachment were added at the request
of Gilles on the Digikam mailing list.
Comment 15 caulier.gilles 2007-01-19 07:52:53 UTC
Dotan,

Is BrilliantPhoto can configure the char-set encoding used with IPTC ? Are you a screenshot of setup ?

Gilles
Comment 16 Dotan Cohen 2007-01-19 12:55:46 UTC
BrilliantPhoto has absolutly no setup screen. There are no configurable options, and therefore no Preferences nor Options dialogs. That's actually one of the things that I _don't_ like about it, but otherwise it was a great program.

Acording to the BrilliantPhoto forums, which have since been taken down, the IPTC spec specifically requires the use of UTF-8 for the data. No other charset is acceptable. I read the spec a long time ago and in my opinion that 'fact' is debateable. However, the BrilliantPhoto author was very certain that only UTF-8 is allowed.

If you have a Windows virtual machine, I'd very much recommend downloading and trying BrilliantPhoto:
http://www.download.com/BrilliantPhoto/3000-2204_4-10326351.html

Digikam could learn quite a few things from BP, such as the wonderfull "fill flash" feature, which brightens underexposed photos better than any other program I've yet seen. The red-eye reduction selector is ROUND, like EYES, so they affect less skin. Why does no other program do that? Should I continue to list BP's other great features?
Comment 17 caulier.gilles 2007-01-19 14:13:49 UTC
>Digikam could learn quite a few things from BP, such as the wonderfull "fill >flash" feature, which brightens underexposed photos better than any other >program I've yet seen. 

Already implemented in current implementation :

http://digikam3rdparty.free.fr/Screenshots/exposureindicatorsfromimageplugins.png
http://digikam3rdparty.free.fr/Screenshots/underexposureindicator.png
http://digikam3rdparty.free.fr/Screenshots/overerexposureindicator.png
http://digikam3rdparty.free.fr/Screenshots/exposureindicatorSetup.png

>The red-eye reduction selector is ROUND, like EYES, so they affect less skin.

The red eyes corrector need to be improved in digiKam ==> in my TODO list.

>Why does no other program do that? Should I continue to list BP's other great
>features? 

yes, on devel ML, not in this room.

Gilles
Comment 18 Dotan Cohen 2007-01-19 17:15:55 UTC
I don't have those options in my 0.9.0 built from the tarball. I'll build from SVN and try it out. As for the BP features, I'll subscribe to the Digikam DevML. Thanks.

Comment 19 caulier.gilles 2007-01-22 07:53:36 UTC
Dotan,

With #14, Are you sure than your attached picture is in UTF8. If digikam failed to show UTF8 char from IPTC, Why i can show it without problem in digiKam...

Also, the "envelope" IPTC tags is not set in this picture to ping application about char encoding...

Gilles
Comment 20 caulier.gilles 2007-01-22 08:00:13 UTC
Dotan,

With the image from #13 all char are broken. Sure this one is certainly encoded using UTF-8... but the "envelope" IPTC tag is not set. There is no way to find witch encoding is used in IPTC to decode text from this picture. 

I have tried to show all IPTC informations from this picture using Photoshop 7.x, and all text strings are broken like digiKam !

If you read the IPTC Spec. this "Envelope" IPTC tag must be set properlly, else all text tags are unsuitable. I suspect a bug in BrillantPhoto.

Gilles
Comment 21 caulier.gilles 2007-01-22 08:04:31 UTC
Created attachment 19374 [details]
IPTC and UTF8 from BrillantPhoto displayed in digiKam and Photoshop
Comment 22 Dotan Cohen 2007-01-22 10:50:01 UTC
It would not surprise me to learn of a bug of the sort in BrilliantPhoto. In any case, the program is abandonware (not being developed by those who purchased the rights to it), so the issue is mute.

I'm certain that a simple shell script could add the appropriate fields, should anybody need it. I'm not the guy to write it, though.

(Trivia: What movie was this from? "If the milk's sour, I ain't the kind of pussy to drink it.")
Comment 23 Leonid Zeitlin 2007-02-16 22:15:02 UTC
Hi all,
Per previous discussion with Gilles on digikam-devel mailing list, I am posting patches that allow the user to specify which encoding to use for IPTC comments. There are two patches, for libkexiv2 and digikam itself. These are made against the current SVN. The part of the digikam patch that modifies iptcwidget.cpp is to be considered temporary and will not be needed once that widget is converted to use libkexiv2. Gilles, please take a look. Thanks!
Comment 24 Leonid Zeitlin 2007-02-16 22:16:16 UTC
Created attachment 19710 [details]
IPTC Encoding patch for libkexiv2
Comment 25 Leonid Zeitlin 2007-02-16 22:16:51 UTC
Created attachment 19711 [details]
IPTC Encoding patch for digikam
Comment 26 caulier.gilles 2007-02-18 11:39:58 UTC
ok,

I will take a look in your patch monday morning.

Gilles
Comment 27 caulier.gilles 2007-02-23 13:21:10 UTC
lz,

I have take a look into your patch. I have a question : why you store the "IPTC Encoding" setting in KDE global. This value must be stored in application setting and passed to a virtual KExiv2 method by the derived class DMetadata. 

Like this libkexiv2 do not depand anymore of KDE core (in the future, i will certainly remove the KDElib depency and let only Qt depency to have only a pure Qt interface). 

Gilles
Comment 28 Leonid Zeitlin 2007-02-26 10:44:30 UTC
Hi Gilles,
I chose to store the settings in kdeglobal, because it is needed not only by Digikam itself, but also by Digikam kioslaves and, in theory, any application that uses libkexiv2. To be more precise, I've found that DMetadata is used in the kioslaves, but class AlbumSettings is not, and thus I could not read this setting from there. Do Digikam kioslaves have access to Digikam app config?
Comment 29 Leonid Zeitlin 2007-04-09 13:08:27 UTC
Hi Gilles,
Just wanted to ask if you had any time to look at my patch further.

Thanks,
  Leonid
Comment 30 Johannes Karlsberg 2007-05-05 18:03:40 UTC
Any progress on this?

Johannes
Comment 31 caulier.gilles 2007-05-05 18:08:06 UTC
I have a patch on my computer to support UTF-8 with IPTC, but i'm not yet fully satisfied by it. I will working on between digiKam 0.9.2-beta1 and beta2 release...

Gilles
Comment 32 Johannes Karlsberg 2007-05-05 20:08:17 UTC
I tried those patches sent here earlier but got errors when patching. Which version should they be applied?

Johannes
Comment 33 caulier.gilles 2007-05-06 09:26:28 UTC
the patch is on my computer, not in B.K.O

Gilles
Comment 34 Arnd Baecker 2007-06-12 20:45:27 UTC
Just for completeness a brief quote from Gilles on the IRC wrt 
this bug:

The patch is not fine as it does not respect the IPTC norm.
IPTC provide a tag to specify the char encoding.
The patch tries to detect the char encoding to parse the string, 
especially the first char which can include specific sub string 
to declare encoding. This is not how the IPTC works.
The patch must be re-written.

Comment 35 Leonid Zeitlin 2007-06-13 18:26:32 UTC
In connection to the previous comment, just want to mention that not every program that writes IPTC caption tags sets the IPTC encoding tag. For example, Picasa and IrfanView do not. Therefore the world must be full of IPTC-tagged images with encoding tag unset.
Comment 36 caulier.gilles 2007-06-13 18:36:10 UTC
Leonid,

Well, i think than if IPTC char encoding tag is not set, content must be interpreted like ASCII... But it just my first impression.

Else, the problem is than digiKam must respect the IPTC norm, especially when we want to update or add tags:

1/ Detect the original encoding of tags.

2/ Set the encoding tag for all others tags (UTF8 is the universal encoding. ExifTool use only this one to add/update iptc)

3/ Convert and update all existing IPTC tag to UTF8 if original encoding is different.

Gilles
Comment 37 Mikolaj Machowski 2007-06-13 19:24:30 UTC
> Well, i think than if IPTC char encoding tag is not set, content must be
> interpreted like ASCII... But it just my first impression.


Please consider this: first try utf-8, then try locale of
system, ASCII should be last resort.

m.
Comment 38 Leonid Zeitlin 2007-06-14 11:51:05 UTC
Hi Gilles,
I completely agree that DigiKam should be strict in setting the tag correctly. However, I think it also should be liberal in reading the IPTC comments with encoding tag unset. I don't think one can assume that the comments are in ASCII, for non-English comments they won't. Even trying UTF-8 and system locale, as Mikolay above suggests, will not achieve full interoperability with Windows applications (something I an very keen about). Indeed, for a Russian-speaking user, any Windows application will write the comments in Windows CP1251 encoding; at the same time under a Unix system, the locale is likely to be either Russian KOI8-R or UTF-8. This is why I advocated adding a configration option for encoding to use in IPTC comments (which could default to UTF-8 of course).

I also wouldn't want digiKam to convert and update all existing IPTC tag to UTF8 if original encoding is different. The issue is, as I mentioned before, that many other applications will not recognize UTF-8 in IPTC comments and display raw Unicode characters, i.e. garbage.
Comment 39 Delphine Ménard 2007-06-29 15:33:41 UTC
Is it possible to at least put a warning on the indentity panel in order to tell people that their info won't be taken in, until this is fully fixed? When I type my name "Ménard", it simply rejects the é altogether and ends up with Mnard. First time I did this I thought I had mistyped and then realized that the field just ate my accentuated character. 
Comment 40 Delphine Ménard 2007-06-29 15:37:28 UTC
Sorry for the trouble. The warning is already there. At the bottom.
Comment 41 Delphine Ménard 2007-06-29 15:38:37 UTC
*** This bug has been confirmed by popular vote. ***
Comment 42 Nadav Kavalerchik 2007-08-03 11:07:07 UTC
i use digikam 0.9.2-final (deb unstable) with kexiv2 0.1.5 + exiv2 0.14.0

when i get pictures from PhotoStation (Windows application for Image management) with Hebrew IPTC tags i can not see them, i get...(&#56319;&#57056;&#56319;)

when i opened Dotan's file from Comment #14 i got to see the correct hebrew word in the IPTC tag , but i'm unable to write UTF8 inside the tags (as you probably know). and the EXIF comment is unreadable.

it would be nice if i could have a combo box to choose the encoding for the tags i see in the editor and to be able to enter new text according to what i choose. it's a workaround and i know it's not the standard way to do this but it seems that different applications implement the IPTC standard in different ways.
or maybe, an option to encode the IPTC tags before i send them away to some other people how use a different application.
Comment 43 Nadav Kavalerchik 2007-08-03 11:11:37 UTC
Created attachment 21322 [details]
IPTC tag in hebrew (iso-8859-8) from PhotoStation

here is an image i got from PhotoStation (a Windows application) with hebrew
IPTC tags (i think it's ISO-8859-8 or windows-1255) that is displaying the tags
in unreadable squares.
Comment 44 Leonid Zeitlin 2007-08-09 13:42:17 UTC
Nadav,
Just want to mention that I have an unofficial patch that does approximately what you wanted. It adds a combo box with IPTC encoding selection to Digikam's configuration dialog (Metadata page). Once the encoding is set, you can read existing IPTC comments in this encoding and write new ones, and they are saved in this encoding.

The patch for Digikam 0.9.2 is here: http://www.csltd.com.ua/~lz/digikam/digikam-0.9.2-iptc-encoding-lz.patch, but you also need to patch libkexiv2, the patch is here: http://www.csltd.com.ua/~lz/digikam/libkexiv2-0.1.5-iptc-encoding-lz.patch.

If you try the patches, please let me know (lz@europe.com).
Comment 45 Nadav Kavalerchik 2007-08-09 14:24:03 UTC
wow :-) that's exactly what i meant!
will this be added to the main Digikam tree ?

(i'll probably give it a try in a few days :-)
Comment 46 Leonid Zeitlin 2007-08-09 14:27:10 UTC
Question to Gilles :-)

Please give it a try and let me know.
Comment 47 Nadav Kavalerchik 2007-08-19 21:00:19 UTC
i've tested Leonid Zeitlin patch and it works fine.
when i set the right encoding and the IPTC fields are readable :-)
thou, i can't write any thing with is not pure ASCII :-(

i think the language encoding combo should be in the IPTC editor and not in the main Setting, because i get different pictures from around the world with different encodings and it is much easier to change for each picture view.

or maybe there should be a default fail over encoding inside the Setting dialog and the IPTC editor should get the right encoding from the beginning of the text string ? (if it's in the specs ? at all)

Thank you Leonid :-) for this beautiful patch !
Nadav :-)
Comment 48 Hoshid 2007-12-13 16:58:36 UTC
Hello

Is there any progress on that bug? It would be great to be able to use UTF-8 for IPTC like libiptcdata!
Comment 49 caulier.gilles 2008-01-14 16:22:13 UTC
*** Bug 155733 has been marked as a duplicate of this bug. ***
Comment 50 caulier.gilles 2008-12-08 08:45:07 UTC
Same here. As digiKam and kipi-plugins for KDE4 support XMP everywhere and use it by default instead IPTC, the problem become obsolete.

I close this file now.

Gilles Caulier
Comment 51 Dotan Cohen 2008-12-08 09:17:44 UTC
> Same here. As digiKam and kipi-plugins for KDE4 support
> XMP everywhere and use it by default instead IPTC, the
> problem become obsolete.

Gilles, I am not certain that the problem is obsolete just because of proper XMP support. The world is full of photos tagged with Irfanview, Photoshop < CS1, and other applications that have written and still do write IPTC. These photos follow a very popular specification, and Digikam does not yet follow that specification. This would be like abandoning HTML 3 support in a web browser just because the browser now support HTML 5. The old documents are still in use.
Comment 52 caulier.gilles 2008-12-08 09:26:28 UTC
Dotan,

It's obsolete because digiKAm and kipi-plugins use XMP instead IPTC now to manage keywords and others UTF8 strings.

The IPTC => XMP transition is open since a while now in photography world. For me it's a waste of time to trying to fix this problem with IPTC wich has _never_ supported UTF-8 as a standard.

XMP is really really better than IPTC. 

Gilles Caulier
Comment 53 caulier.gilles 2008-12-08 09:28:20 UTC
Dotan,

To be more clear, i close this file as WONTFIX...

Gilles Caulier
Comment 54 Dotan Cohen 2008-12-08 17:26:54 UTC
It's your call, Gilles! I trust you to do what you feel is best for Digikam.
Comment 55 caulier.gilles 2020-08-28 07:37:30 UTC
Git commit ad0ab9efeba6e2fe3bb86207a91499e4e8eb170f by Gilles Caulier.
Committed on 28/08/2020 at 05:19.
Pushed by cgilles into branch 'master'.

IPTC and Utf8 support: If a tag is string, check if global IPTC characterset is null to convert in latin1, else we expect to interpret the string as utf8.
We use std::string accessor from Exiv2 to get an Utf8 cenversion of string. If it do not work, well this problem need to be reported as UPSTREAM
to Exiv2 as pre-cenversion of string is not done in background by the library.
This patch prevent to display latin1 string with a wrong Utf8 conversion which can break some characters.
BUGS: 379581
BUGS: 379050
FIXED-IN: 7.1.0

M  +27   -3    core/libs/metadataengine/engine/metaengine_iptc.cpp

https://invent.kde.org/graphics/digikam/commit/ad0ab9efeba6e2fe3bb86207a91499e4e8eb170f