Bug 429261 - Malformed characters in EXIF Artist field
Summary: Malformed characters in EXIF Artist field
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Metadata-Exif (show other bugs)
Version: 7.1.0
Platform: Microsoft Windows Microsoft Windows
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-17 20:37 UTC by José Oliver-Didier
Modified: 2020-11-20 21:45 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In: 7.2.0


Attachments
Sample image (2.45 MB, image/jpeg)
2020-11-17 20:37 UTC, José Oliver-Didier
Details
Artist.png (5.70 KB, image/png)
2020-11-17 20:51 UTC, Maik Qualmann
Details
EXIF Panel with malformed Artist field (37.39 KB, image/png)
2020-11-17 23:05 UTC, José Oliver-Didier
Details
flickr.png (10.01 KB, image/png)
2020-11-20 18:42 UTC, Maik Qualmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description José Oliver-Didier 2020-11-17 20:37:26 UTC
Created attachment 133407 [details]
Sample image

SUMMARY
Malformed characters in Exif Artist field.

STEPS TO REPRODUCE
1. Open attached sample image file in digikam.
2. Open EXIF Metadata panel 

OBSERVED RESULT
- Artist exif field does not display correctly the character é (UTF-8 encoding)

EXPECTED RESULT
- Correct display of characters, at least read.

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
- The EXIF spec sets the Artist 0x013b tag as Ascii, same as the Copyright 0x8298 tag. Still I noticed that Digikam does display the UTF-8 characters in the Copyright tag. I am opening this bug based on this inconsistency, why Copyright tag and not Artist tag?
- Metadata Working Group recommends using UTF-8 on these fields, and some applications do so.
Comment 1 Maik Qualmann 2020-11-17 20:51:59 UTC
Created attachment 133409 [details]
Artist.png

Artists are displayed correctly here with digiKam-7.2.0-Beta2.

Maik
Comment 2 Maik Qualmann 2020-11-17 20:53:15 UTC
On which platform do you use digiKam?

Maik
Comment 3 José Oliver-Didier 2020-11-17 23:05:59 UTC
Created attachment 133416 [details]
EXIF Panel with malformed Artist field

Attached you will find a screenshot of the EXIF Metadata panel. This is on Digikam 7.1.0 running on Windows 10 build 10.0.19042.630.
Comment 4 Maik Qualmann 2020-11-18 09:31:09 UTC
Here under Linux the copyright symbol is displayed incorrectly, the name correct. Exiftool displays it like digiKam.
Even under Windows, 3 different programs show it differently. Exiftool can probably only use ASCII in the command prompt. The "Exif Viewer" program displays it like digiKam under Windows. Picasa shows it like digiKam on Linux.
I will have to take a closer look at the coding of the fields in the image to see if they conform to the standard.

Maik
Comment 5 Maik Qualmann 2020-11-18 20:36:26 UTC
Git commit cd921607a1be16a560f99106e616a1f96c579ed0 by Maik Qualmann.
Committed on 18/11/2020 at 20:34.
Pushed by mqualmann into branch 'master'.

QString::fromLocal8Bit() => QString::fromStdString()

M  +8    -9    core/libs/metadataengine/engine/metaengine_exif.cpp
M  +1    -1    core/libs/metadataengine/engine/metaengine_gps.cpp
M  +17   -17   core/libs/metadataengine/engine/metaengine_iptc.cpp
M  +1    -1    core/libs/metadataengine/engine/metaengine_p.cpp
M  +2    -2    core/libs/metadataengine/engine/metaengine_xmp.cpp

https://invent.kde.org/graphics/digikam/commit/cd921607a1be16a560f99106e616a1f96c579ed0
Comment 6 Maik Qualmann 2020-11-18 20:39:05 UTC
I think using QString::fromLocal8Bit() to convert from std::string on Windows is the cause. Gilles, can you create new Windows bundles tomorrow?

Maik
Comment 7 José Oliver-Didier 2020-11-18 21:23:16 UTC
Using the Windows command prompt it is necessary to set the code page to UTF-8 using the chcp 65001 command so that exiftool displays the values correctly. On my PC, both copyright and artist appear corrently using exiftool.
Comment 8 Maik Qualmann 2020-11-20 08:35:38 UTC
The representation of the special characters now corresponds to the Linux version. The copyright symbol in the EXIF is now displayed as a question mark, XMP is ok. This in turn corresponds exactly to the representation of Picasa and other Exif display programs under Windows. For me the bug is fixed.

Maik
Comment 9 José Oliver-Didier 2020-11-20 18:03:18 UTC
Hello Maik,

I am bit confused - "the EXIF is now displayed as a question mark". So in the exif copyright field - special characters will be displayed with question marks?

Windows File Explorer writes and reads special characters to this field as well as in GeoSetter which relies on exiftool for metadata operations. Photo sharing site Flickr, also reads special characters on exif Artist and Copyright fields.
Comment 10 Maik Qualmann 2020-11-20 18:42:09 UTC
Created attachment 133506 [details]
flickr.png

Uploaded your original image to Flickr and display the Exif. You see, Flickr also shows the copyright symbol as a question mark. Like Picasa on Windows too. Your program that inserted the copyright string encoded it in the Windows code page format. This is not the standard. Either unicode (UTF8) or clean ASCII.

Maik
Comment 11 José Oliver-Didier 2020-11-20 19:42:13 UTC
Ok, that explains it!

I was under the assumption it was coded UTF-8, all my other photos on Flickr displayed the EXIF field value correctly. The other encoding much have been used by accident while crafting my test photos. Now it makes better sense.