Bug 429261

Summary: Malformed characters in EXIF Artist field
Product: [Applications] digikam Reporter: José Oliver-Didier <jose_oliver>
Component: Metadata-ExifAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: metzpinguin
Priority: NOR    
Version: 7.1.0   
Target Milestone: ---   
Platform: Microsoft Windows   
OS: Microsoft Windows   
Latest Commit: Version Fixed In: 7.2.0
Sentry Crash Report:
Attachments: Sample image
Artist.png
EXIF Panel with malformed Artist field
flickr.png

Description José Oliver-Didier 2020-11-17 20:37:26 UTC
Created attachment 133407 [details]
Sample image

SUMMARY
Malformed characters in Exif Artist field.

STEPS TO REPRODUCE
1. Open attached sample image file in digikam.
2. Open EXIF Metadata panel 

OBSERVED RESULT
- Artist exif field does not display correctly the character é (UTF-8 encoding)

EXPECTED RESULT
- Correct display of characters, at least read.

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
- The EXIF spec sets the Artist 0x013b tag as Ascii, same as the Copyright 0x8298 tag. Still I noticed that Digikam does display the UTF-8 characters in the Copyright tag. I am opening this bug based on this inconsistency, why Copyright tag and not Artist tag?
- Metadata Working Group recommends using UTF-8 on these fields, and some applications do so.
Comment 1 Maik Qualmann 2020-11-17 20:51:59 UTC
Created attachment 133409 [details]
Artist.png

Artists are displayed correctly here with digiKam-7.2.0-Beta2.

Maik
Comment 2 Maik Qualmann 2020-11-17 20:53:15 UTC
On which platform do you use digiKam?

Maik
Comment 3 José Oliver-Didier 2020-11-17 23:05:59 UTC
Created attachment 133416 [details]
EXIF Panel with malformed Artist field

Attached you will find a screenshot of the EXIF Metadata panel. This is on Digikam 7.1.0 running on Windows 10 build 10.0.19042.630.
Comment 4 Maik Qualmann 2020-11-18 09:31:09 UTC
Here under Linux the copyright symbol is displayed incorrectly, the name correct. Exiftool displays it like digiKam.
Even under Windows, 3 different programs show it differently. Exiftool can probably only use ASCII in the command prompt. The "Exif Viewer" program displays it like digiKam under Windows. Picasa shows it like digiKam on Linux.
I will have to take a closer look at the coding of the fields in the image to see if they conform to the standard.

Maik
Comment 5 Maik Qualmann 2020-11-18 20:36:26 UTC
Git commit cd921607a1be16a560f99106e616a1f96c579ed0 by Maik Qualmann.
Committed on 18/11/2020 at 20:34.
Pushed by mqualmann into branch 'master'.

QString::fromLocal8Bit() => QString::fromStdString()

M  +8    -9    core/libs/metadataengine/engine/metaengine_exif.cpp
M  +1    -1    core/libs/metadataengine/engine/metaengine_gps.cpp
M  +17   -17   core/libs/metadataengine/engine/metaengine_iptc.cpp
M  +1    -1    core/libs/metadataengine/engine/metaengine_p.cpp
M  +2    -2    core/libs/metadataengine/engine/metaengine_xmp.cpp

https://invent.kde.org/graphics/digikam/commit/cd921607a1be16a560f99106e616a1f96c579ed0
Comment 6 Maik Qualmann 2020-11-18 20:39:05 UTC
I think using QString::fromLocal8Bit() to convert from std::string on Windows is the cause. Gilles, can you create new Windows bundles tomorrow?

Maik
Comment 7 José Oliver-Didier 2020-11-18 21:23:16 UTC
Using the Windows command prompt it is necessary to set the code page to UTF-8 using the chcp 65001 command so that exiftool displays the values correctly. On my PC, both copyright and artist appear corrently using exiftool.
Comment 8 Maik Qualmann 2020-11-20 08:35:38 UTC
The representation of the special characters now corresponds to the Linux version. The copyright symbol in the EXIF is now displayed as a question mark, XMP is ok. This in turn corresponds exactly to the representation of Picasa and other Exif display programs under Windows. For me the bug is fixed.

Maik
Comment 9 José Oliver-Didier 2020-11-20 18:03:18 UTC
Hello Maik,

I am bit confused - "the EXIF is now displayed as a question mark". So in the exif copyright field - special characters will be displayed with question marks?

Windows File Explorer writes and reads special characters to this field as well as in GeoSetter which relies on exiftool for metadata operations. Photo sharing site Flickr, also reads special characters on exif Artist and Copyright fields.
Comment 10 Maik Qualmann 2020-11-20 18:42:09 UTC
Created attachment 133506 [details]
flickr.png

Uploaded your original image to Flickr and display the Exif. You see, Flickr also shows the copyright symbol as a question mark. Like Picasa on Windows too. Your program that inserted the copyright string encoded it in the Windows code page format. This is not the standard. Either unicode (UTF8) or clean ASCII.

Maik
Comment 11 José Oliver-Didier 2020-11-20 19:42:13 UTC
Ok, that explains it!

I was under the assumption it was coded UTF-8, all my other photos on Flickr displayed the EXIF field value correctly. The other encoding much have been used by accident while crafting my test photos. Now it makes better sense.