Version: (using KDE KDE 3.4.2) Installed from: SuSE RPMs OS: Linux I use suse linux OSS 10.0. This distribution uses utf8 LANG=fr_FR.UTF8 in my case. I use to write comments in exifs. all goes well. comments are written, konqueror displays them. but if I use konqueror to copy a photo from a folder to an other, digikam do no more display correctly the utf8 characters (displays Valérie in place of Valérie), even in the edit comments utility and worst when exporting to html the same Valérie is exported :-( <div align="center">Valérie et Virginie</div>
in 0.8.1 (svn compiled locally), IN digikam * right clic copy/paste: the utf8 is cripled * mouse clic ans shift then copy: utf8 is _not_ cripled jdd
> I use suse linux OSS 10.0. This distribution uses utf8 LANG=fr_FR.UTF8 > in my case. I use to write comments in exifs. Confirming problem for trunk, Mandriva 2005LE, whole KDE from SVN 3.5 branch (digiKam from trunk - 0.9svn). LANG=pl_PL (encoding iso-8859-2). When copying images with comments from one album to another non latin1 letters are broken. Looks like utf-8 bits are displayed directly in iso-8859-2.
I can not confirm your utf8 problems here on Gentoo Linux with digikam 0.9.0 SVN and LANG=de_DE.UTF-8. However I found another strange behaviour while trying to reproduce it: after I copied the image to another dir/album with konqueror or inside digikam the comment gets completely lost (although I checks the "embedding the comments in exif"). I will have a look on it the next days cause Gilles is currently changing the internals of digikam that deal with exif data. Maybe my problem is related. Can you please provide an example image?
Yes sebastian, let's me finish to remove libKExif depency from digiKam core and we will hack this problem using trunk branch. There are some working hours to use Exiv2 instead libKexif into digiKam core at all. I think completed this task this week. Remember me next week (:=)))... Gilles Caulier
The core metadata class is now updated. Please try agian using trunk svn branch implementation. Thanks in advance Gilles Caulier
Works for me. All old broken comments are now showed properly, also moving of images between albums don't destroy comments. (.9svn)
*** Bug 98462 has been marked as a duplicate of this bug. ***
SVN commit 537807 by mwiesweg: Unicode support for JFIF and EXIF comments: - use UTF8 for JFIF comment - use Unicode (UCS-2) to write JPEG UserComment, support charset specification when reading the UserComment - add convertCommentValue method to DMetaData Using UTF8 for JFIF is simple and easy and should work. The UCS-2 support needs testing (and a decision if we always want to write Unicode, or a way to find out when we need to and when we can as well write ASCII) CCBUG: 120241 114211 M +84 -28 dmetadata/dmetadata.cpp M +3 -0 dmetadata/dmetadata.h M +11 -3 widgets/metadata/exifwidget.cpp --- trunk/extragear/graphics/digikam/libs/dmetadata/dmetadata.cpp #537806:537807 @@ -28,6 +28,7 @@ // Qt includes. #include <qfile.h> +#include <qtextcodec.h> #include <qwmatrix.h> // KDE includes. @@ -635,43 +636,46 @@ QString DMetadata::getImageComment() const { try - { + { + if (d->filePath.isEmpty()) + return QString(); + // In first we trying to get image comments, outside of Exif and IPTC. - QString comments(d->imageComments.c_str()); - + QString comments = QString::fromUtf8(d->imageComments.c_str()); + if (!comments.isEmpty()) - return comments; - - // In second, we trying to get Exif comments - + return comments; + + // In second, we trying to get Exif comments + if (!d->exifMetadata.empty()) { Exiv2::ExifKey key("Exif.Photo.UserComment"); Exiv2::ExifData exifData(d->exifMetadata); Exiv2::ExifData::iterator it = exifData.findKey(key); - + if (it != exifData.end()) { - QString ExifComment(it->toString().c_str()); - - if (!ExifComment.isEmpty()) - return ExifComment; + QString exifComment = convertCommentValue(*it); + + if (!exifComment.isEmpty()) + return exifComment; } } - - // In third, we trying to get IPTC comments - + + // In third, we trying to get IPTC comments + if (!d->iptcMetadata.empty()) { Exiv2::IptcKey key("Iptc.Application2.Caption"); Exiv2::IptcData iptcData(d->iptcMetadata); Exiv2::IptcData::iterator it = iptcData.findKey(key); - + if (it != iptcData.end()) { - QString IptcComment(it->toString().c_str()); - + QString IptcComment = QString::fromLatin1(it->toString().c_str()); + if (!IptcComment.isEmpty()) return IptcComment; } @@ -683,15 +687,15 @@ kdDebug() << "Cannot get Image comments using Exiv2 (" << QString::fromLocal8Bit(e.what().c_str()) << ")" << endl; - } - + } + return QString(); } bool DMetadata::setImageComment(const QString& comment) { try - { + { if (comment.isEmpty()) return false; @@ -699,13 +703,21 @@ // In first we trying to set image comments, outside of Exif and IPTC. - const std::string str(comment.latin1()); + const std::string str(comment.utf8()); d->imageComments = str; // In Second we write comments into Exif. - - d->exifMetadata["Exif.Photo.UserComment"] = comment.latin1(); - + + // Be aware that we are dealing with a UCS-2 string. + // Null termination means \0\0, strlen does not work, + // do not use any const-char*-only methods, + // pass a std::string and not a const char * to ExifDatum::operator=(). + const unsigned short *ucs2 = comment.ucs2(); + std::string exifComment("charset=\"Unicode\" "); + exifComment.append((const char*)ucs2, sizeof(unsigned short) * comment.length()); + d->exifMetadata["Exif.Photo.UserComment"] = exifComment; + //d->exifMetadata["Exif.Photo.UserComment"] = comment.latin1(); + // In Third we write comments into Iptc. Note that Caption IPTC tag is limited to 2000 char. setImageProgramId(); @@ -713,7 +725,7 @@ QString commentIptc = comment; commentIptc.truncate(2000); d->iptcMetadata["Iptc.Application2.Caption"] = commentIptc.latin1(); - + return true; } catch( Exiv2::Error &e ) @@ -721,11 +733,55 @@ kdDebug() << "Cannot set Comment into image using Exiv2 (" << QString::fromLocal8Bit(e.what().c_str()) << ")" << endl; - } - + } + return false; } +QString DMetadata::convertCommentValue(const Exiv2::Exifdatum &exifDatum) +{ + std::string comment = exifDatum.toString(); + std::string charset; + + // libexiv2 will prepend "charset=\"SomeCharset\" " if charset is specified + // Before conversion to QString, we must know the charset, so we stay with std::string for a while + if (comment.length() > 8 && comment.substr(0, 8) == "charset=") + { + // the prepended charset specification is followed by a blank + std::string::size_type pos = comment.find_first_of(' '); + if (pos != std::string::npos) + { + // extract string between the = and the blank + charset = comment.substr(8, pos-8); + // get the rest of the string after the charset specification + comment = comment.substr(pos+1); + } + } + + if (charset == "\"Unicode\"") + { + // QString expects a null-terminated UCS-2 string. + // Is it already null terminated? In any case, add termination for safety. + comment += "\0\0"; + return QString::fromUcs2((unsigned short *)comment.data()); + } + else if (charset == "\"Jis\"") + { + QTextCodec *codec = QTextCodec::codecForName("JIS7"); + return codec->toUnicode(comment.c_str()); + } + else if (charset == "\"Ascii\"") + { + return QString::fromLatin1(comment.c_str()); + } + else + { + // or from local8bit ?? + return QString::fromLatin1(comment.c_str()); + } +} + + /* Iptc.Application2.Urgency <==> digiKam Rating links: --- trunk/extragear/graphics/digikam/libs/dmetadata/dmetadata.h #537806:537807 @@ -30,6 +30,7 @@ // Exiv2 includes. #include <exiv2/types.hpp> +#include <exiv2/exif.hpp> // Local includes. @@ -104,6 +105,8 @@ PhotoInfoContainer getPhotographInformations() const; + static QString convertCommentValue(const Exiv2::Exifdatum &comment); + private: DImg::FORMAT fileFormat(const QString& filePath); --- trunk/extragear/graphics/digikam/libs/widgets/metadata/exifwidget.cpp #537806:537807 @@ -155,9 +155,17 @@ QString key = QString::fromLocal8Bit(md->key().c_str()); // Decode the tag value with a user friendly output. - std::ostringstream os; - os << *md; - QString tagValue = QString::fromLocal8Bit(os.str().c_str()); + QString tagValue; + if (key == "Exif.Photo.UserComment") + { + tagValue = DMetadata::convertCommentValue(*md); + } + else + { + std::ostringstream os; + os << *md; + tagValue = QString::fromLocal8Bit(os.str().c_str()); + } tagValue.replace("\n", " "); // We apply a filter to get only standard Exif tags, not maker notes.
Marcel, have you find some documentations about JFIF comments encoding ? Also, about a decision if we always want to write Unicode or ASCII, i propose to add an QCheckbox option in metadata setup dialog page. I think that Unicode must be always enable by default. Your viewpoint ? Gilles
Created attachment 15985 [details] fixed caption encoding when loading from jpeg exif
SVN commit 538809 by cgilles: digikam from stable : fix JFIF comments section encoding extraction to respect UTF8 CCMAIL: digikam-devel@kde.org CCBUGS: 120241 M +1 -1 jpegmetadata.cpp --- branches/stable/extragear/graphics/digikam/libs/jpegutils/jpegmetadata.cpp #538808:538809 @@ -118,7 +118,7 @@ continue; } - comments = QString::fromAscii((const char*)marker->data, + comments = QString::fromUtf8((const char*)marker->data, marker->data_length); } else if (marker->marker == M_EXIF)
SVN commit 543272 by mwiesweg: Add some autodetection magic for charset support - DMetadata::detectEncodingAndDecode will check if a given string is in UTF8. If not, it will leave it to QTextCodec to decide if the local charset or latin1 will be used - use detectEncodingAndDecode when reading the JFIF comment and for Exif comments with undefined encoding - When writing the Exif comment, use UCS-2 only when necessary. Check with QTextCodec::canEncode if plain latin1 is enough. I have tested this successfully with some Arabian and cyrillic characters. But please test this with some more pictures. UTF-8 should be no problem, but the local8Bit vs. latin1 decision may be. CCBUGS: 120241, 114211 M +75 -15 dmetadata.cpp M +3 -0 dmetadata.h --- trunk/extragear/graphics/digikam/libs/dmetadata/dmetadata.cpp #543271:543272 @@ -33,7 +33,9 @@ // KDE includes. +#include <kapplication.h> #include <kdebug.h> +#include <kstringhandler.h> #include <ktempfile.h> // Exiv2 includes. @@ -714,7 +716,7 @@ // In first we trying to get image comments, outside of Exif and IPTC. - QString comments = QString::fromUtf8(d->imageComments.c_str()); + QString comments = detectEncodingAndDecode(d->imageComments); if (!comments.isEmpty()) return comments; @@ -780,18 +782,32 @@ // In Second we write comments into Exif. - // Be aware that we are dealing with a UCS-2 string. - // Null termination means \0\0, strlen does not work, - // do not use any const-char*-only methods, - // pass a std::string and not a const char * to ExifDatum::operator=(). - const unsigned short *ucs2 = comment.ucs2(); - std::string exifComment("charset=\"Unicode\" "); - exifComment.append((const char*)ucs2, sizeof(unsigned short) * comment.length()); - d->exifMetadata["Exif.Photo.UserComment"] = exifComment; - //d->exifMetadata["Exif.Photo.UserComment"] = comment.latin1(); + // Write as Unicode only when necessary. + QTextCodec *latin1Codec = QTextCodec::codecForName("iso8859-1"); + if (latin1Codec->canEncode(comment)) + { + // write as ASCII + std::string exifComment("charset=\"Ascii\" "); + exifComment += comment.latin1(); + d->exifMetadata["Exif.Photo.UserComment"] = exifComment; + } + else + { + // write as Unicode (UCS-2) - // In Third we write comments into Iptc. Note that Caption IPTC tag is limited to 2000 char. + // Be aware that we are dealing with a UCS-2 string. + // Null termination means \0\0, strlen does not work, + // do not use any const-char*-only methods, + // pass a std::string and not a const char * to ExifDatum::operator=(). + const unsigned short *ucs2 = comment.ucs2(); + std::string exifComment("charset=\"Unicode\" "); + exifComment.append((const char*)ucs2, sizeof(unsigned short) * comment.length()); + d->exifMetadata["Exif.Photo.UserComment"] = exifComment; + } + // In Third we write comments into Iptc. + // Note that Caption IPTC tag is limited to 2000 char and ASCII charset. + QString commentIptc = comment; commentIptc.truncate(2000); d->iptcMetadata["Iptc.Application2.Caption"] = commentIptc.latin1(); @@ -815,7 +831,7 @@ { std::string comment = exifDatum.toString(); std::string charset; - + // libexiv2 will prepend "charset=\"SomeCharset\" " if charset is specified // Before conversion to QString, we must know the charset, so we stay with std::string for a while if (comment.length() > 8 && comment.substr(0, 8) == "charset=") @@ -830,7 +846,7 @@ comment = comment.substr(pos+1); } } - + if (charset == "\"Unicode\"") { // QString expects a null-terminated UCS-2 string. @@ -849,8 +865,7 @@ } else { - // or from local8bit ?? - return QString::fromLatin1(comment.c_str()); + return detectEncodingAndDecode(comment); } } catch( Exiv2::Error &e ) @@ -863,6 +878,51 @@ return QString(); } +QString DMetadata::detectEncodingAndDecode(const std::string &value) +{ + // For charset autodetection, we could use sophisticated code + // (Mozilla chardet, KHTML's autodetection, QTextCodec::codecForContent), + // but that is probably too much. + // We check for UTF8, Local encoding and ASCII. + + if (value.empty()) + return QString(); + +#if KDE_IS_VERSION(3,2,0) + if (KStringHandler::isUtf8(value.c_str())) + { + return QString::fromUtf8(value.c_str()); + } +#else + // anyone who is still running KDE 3.0 or 3.1 is missing so many features + // that he will have to accept this missing feature. + return QString::fromUtf8(value.c_str()); +#endif + + // Utf8 has a pretty unique byte pattern. + // Thats not true for ASCII, it is not possible + // to reliably autodetect different ISO-8859 charsets. + // We try if QTextCodec can decide here, otherwise we use Latin1. + // Or use local8Bit as default? + + // load QTextCodecs + QTextCodec *latin1Codec = QTextCodec::codecForName("iso8859-1"); + //QTextCodec *utf8Codec = QTextCodec::codecForName("utf8"); + QTextCodec *localCodec = QTextCodec::codecForLocale(); + + // make heuristic match + int latin1Score = latin1Codec->heuristicContentMatch(value.c_str(), value.length()); + int localScore = localCodec->heuristicContentMatch(value.c_str(), value.length()); + + // convert string: + // Use whatever has the larger score, local or ASCII + if (localScore >= 0 && localScore >= latin1Score) + return localCodec->toUnicode(value.c_str(), value.length()); + else + return QString::fromLatin1(value.c_str()); +} + + /* Iptc.Application2.Urgency <==> digiKam Rating links: --- trunk/extragear/graphics/digikam/libs/dmetadata/dmetadata.h #543271:543272 @@ -21,6 +21,8 @@ #ifndef DMETADATA_H #define DMETADATA_H +#include <string> + // QT includes. #include <qcstring.h> @@ -108,6 +110,7 @@ PhotoInfoContainer getPhotographInformations() const; static QString convertCommentValue(const Exiv2::Exifdatum &comment); + static QString detectEncodingAndDecode(const std::string &value); private:
We now have support for reading, autodetecting and writing comments as UTF8. Closing this bug.