Bug 505668

Summary: Tags with accent are not displayed correctly
Product: [Applications] digikam Reporter: Ludovic <grand.titus>
Component: Tags-ManagerAssignee: Digikam Developers <digikam-bugs-null>
Status: REPORTED ---    
Severity: normal CC: caulier.gilles, metzpinguin
Priority: NOR    
Version First Reported In: 8.6.0   
Target Milestone: ---   
Platform: Microsoft Windows   
OS: Microsoft Windows   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Archive with screenshots + a sample JPG

Description Ludovic 2025-06-16 20:49:58 UTC
Hello

I am French so my tags may contain accented letters. But accented letters are displayed with a “?” in a diamond.
See <Screenshot 1 - Tags with accented letters replaced by a diamond "?"> in the joined pdf

When I look at Metadata, sometimes accented letters are correctly displayed but not always:
- IPTC: Bad display (See <Screenshot 1 - Tags with accented letters replaced by a diamond "?">)
- XMP: Correct display (See <Screenshot 2 – XMP displays correctly accented letters>)

But when I use another tool (XnView MP) to watch Metadata on the same image, I don’t have this issue.
See <Screenshot 3 & 4 – XnViewMP displays correctly accented letters>

Remarks:
- I asked the question if I miss some configuration here: https://discuss.kde.org/t/tags-with-accent-are-not-displayed-correctly/35307/1  . But I didn't receive any answer
- I joined a zip containing a sample with accent in many field of metadata (Caption, Keywords, Object name and Supplemental category)
- I can also provide a DNG with the same issue

Thanks for your help
Comment 1 Ludovic 2025-06-16 21:00:11 UTC
Created attachment 182312 [details]
Archive with screenshots + a sample JPG
Comment 2 caulier.gilles 2025-06-17 05:26:49 UTC
Hi Ludovic,

There is no reason that IPTC non ascii characters are not properly displayed in digiKam. We support since a while the IPTC encoding tags indicating the right character-set used in the IPTC metadata. 

XMP has not this problem as all metadata are always encoded in UTF8.

How have been generated the JPG file ? From a DNG image ? Which software have been used ?

Best regards

Gilles Caulier
Comment 3 caulier.gilles 2025-06-17 05:35:07 UTC
Using digiKam 8.7.0 pre-release under MacOS, the IPTC wrong encoding char is reproducible. But in fact there is not IPTC tag about the encoding to respect. So digiKam use ASCII by default.

From ExifTool : https://exiftool.org/TagNames/IPTC.html

"CodedCharacterSet 	string[0,32]! 	(values are entered in the form "ESC X Y[, ...]". The escape sequence for UTF-8 character coding is "ESC % G", but this is displayed as "UTF8" for convenience. Either string may be used when writing. The value of this tag affects the decoding of string values in the Application and NewsPhoto records. This tag is marked as "unsafe" to prevent it from being copied by default in a group operation because existing tags in the destination image may use a different encoding. When creating a new IPTC record from scratch, it is suggested that this be set to "UTF8" if special characters are a possibility)"

The last sentence from this tag description is clear : the tag must be included in the IPTC chunk, but it's not...

Best regards

Gilles Caulier
Comment 4 Maik Qualmann 2025-06-17 06:21:23 UTC
I can easily write accented characters to IPTC with digiKam (Exiv2 and ExifTool). I suspect that your IPTC characters were not encoded correctly by another program and that they are a Windows code page or something similar. That would explain why XnView displays them correctly under Windows. I'll investigate this further tonight.

Maik
Comment 5 Ludovic 2025-06-18 21:48:01 UTC
Hello

Thanks all for your replies.

I use https://geosetter.de/en/main-en/ for tagging my photos.
I use the last available version. Unfortunately this application is no longer maintained and the beta version I use is about 2 years old.

I understand that maybe there is an issue in the IPTC tag.
I tried to find it with the command: 
> exiftool.exe -validate -warning -error -a  "C:\Temp\Digikam_test_accents\20250530_crop.jpg"
But ExifTool doesn't really complains:
> Validate                        : 1 Warning (minor)
> Warning                         : [minor] MakerNotes:PreviewImageStart is past end of file

I have also tried several other tools to view image metadata:
- ExifToolGui: https://github.com/FrankBijnen/ExifToolGui/releases/
- Metadata++: https://www.logipole.com/download.htm
All of them managed to display non ascii letters.

I don't know if it can help, but I noticed that ExifToolGui seems to use the argument "-CHARSET FILENAME=UTF8" when it retrieves IPTC metadata.
Just for information, the full argument list is: "-echo4 {ready16} -CHARSET FILENAME=UTF8 -v0 -overwrite_original -sep * -c %.6f° -API WindowsWideFile=1 -API WindowsLongPath=1 -API GeoDir=C:\Multimedia\ExifToolGUI\GeoLocation500 -g0:1 -a -S -Iptc:All 20250530_crop.jpg -execute16"
Comment 6 Ludovic 2025-06-18 22:05:44 UTC
I have just found a workaround to my main issue (the fact that tags with "?" are generated):
In Settings / Configure Digikam / Metadata / Advanced / Tags, I disabled Iptc.Application2.Keywords
Comment 7 Maik Qualmann 2025-06-19 05:57:19 UTC
Here's the output from ExifTool in the Linux console (generally UTF-8). As you can see, the same result.


Image Description               : C�est au tournant des 14 et 15� si�cles que Louis, duc d�Orl�ans (1372-1407) entreprend la construction du ch�teau de Pierrefonds. Il est l�un des �difices les plus imposants et imprenables de son �poque. Partiellement d�truit au 17� si�cle, il est restaur� au 19� si�cle � la demande de Napol�on III par  Viollet-le-Duc.


The problem is that it's not pure ASCII or UTF-8, but Windows Code Page encoding. Windows Code Page encoding has no place in metadata, though. Your previous program made a mistake here.

Maik
Comment 8 caulier.gilles 2025-06-23 05:29:20 UTC
Ludovic,

Following last comment from Maik, the idea will be to reencode all tags with ExifTool as you file are badly encoded with WCP, not UTF8 or ASCII.

We cannot  do anything here with digiKam, i fear...

Best

Gilles Caulier
Comment 9 Ludovic 2025-07-02 19:45:24 UTC
Sorry for the very late answer.

It took me some time to understand what happens because the Windows world can be very strange...

Following this Exiftool Q&A https://exiftool.org/faq.html#Q18, I pass my command console into UTF8 (chcp 65001).
Because without this, even IPTC oncoded in UTF8 was not displayed correctly in the console.
Example with DSC02003.jpg that has been encoded in UTF8 and correctly displayed jn Digikam:
>exiftool -iptc:all -charset filename=utf8   DSC02003.jpg
>Coded Character Set             : UTF8
>Date Created                    : 2013:08:04
>Time Created                    : 12:27:11+00:00
>Country-Primary Location Name   : France
>Country-Primary Location Code   : FRA
>City                            : Sainte-Gemme
>Sub-location                    : La Ferme de Magn├®
>Province-State                  : Nouvelle-Aquitaine
>Keywords                        : France, La Ferme de Magn├®, Sainte-Gemme, Nouvelle-Aquitaine

After this pre-requisit, I am now able to display correctly my pictures that has been encoded in latin1 
Example with 20250530_P7387.jpg that is not correctly displayed jn Digikam:
>exiftool -iptc:all -charset iptc=latin1   20250530_P7387.jpg
>Keywords                        : France, Hauts-de-France, Pierrefonds, Ethan MARTIN, Frédérique MARTIN, Thais MARTIN
>By-line                         : Ludovic Martin
>Sub-location                    : Pierrefonds
>Province-State                  : Hauts-de-France
>Country-Primary Location Code   : FRA
>Country-Primary Location Name   : France
>Caption-Abstract                : C’est au tournant des 14 et 15è siècles que Louis, duc d’Orléans (1372-1407) entreprend la construction du château de Pierrefonds. Il est l’un des édifices les plus imposants et imprenables de son époque. Partiellement détruit au 17è siècle, il est restauré au 19é siècle à la demande de Napoléon III par  Viollet-le-Duc.
>Application Record Version      : 4
>Time Created                    : 12:52:33+02:00
>Object Name                     : Château de Pierrefonds

After some more tests, I discover that I don't need to specify the charset:
>exiftool -iptc:all    20250530_P7387.jpg
This command provides the same result as the previous one (with "-charset iptc=latin1").

In fact, I finaly see in the exiftool documentation https://exiftool.org/exiftool_pod.html#Input-output-text-formatting, in the description of "-charset [[TYPE=]CHARSET]", that 'latin' is the default value for iptc when IPTC:CodedCharacterSet is not defined:
>Other values of TYPE listed below are used to specify the internal encoding of various meta information formats.
>TYPE       Description                                  Default
>---------  -------------------------------------------  -------
>EXIF       Internal encoding of EXIF "ASCII" strings    (none)
>ID3        Internal encoding of ID3v1 information       Latin
>IPTC       Internal IPTC encoding to assume when        Latin
>            IPTC:CodedCharacterSet is not defined
>Photoshop  Internal encoding of Photoshop IRB strings   Latin
>QuickTime  Internal encoding of QuickTime strings       MacRoman
>RIFF       Internal encoding of RIFF strings            0

Would it be possible that Digikam has the same defaulting than exiftool?
Comment 10 Ludovic 2025-07-02 20:21:51 UTC
To complete my previous answer:

In the Exiftool FAQ https://exiftool.org/faq.html#Q10, I have discovered the command to convert an image IPTC latin->UTF8:
>exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8 20250530_P7387.jpg

But to be honest, I am a bit afraid to do a massive conversion on all my photos...
So my previous question is still valid:
Would it be possible that Digikam has the same defaulting than exiftool when IPTC:CodedCharacterSet is not defined?
Comment 11 Ludovic 2025-07-19 22:13:28 UTC
Maik

Can you consider to implement in DigiKam the same defaulting as in ExifTool fot IPTC metadata without charset (https://exiftool.org/exiftool_pod.html#Input-output-text-formatting) ?

I use Luminar Neo to process my photos and I cannot force him to write IPTC in UTF8.
It will really be painful if I have to "fix"(1) all pictures exported by Luminar.

(1) It doesn't seem to be a bug, at least from the Exiftool's point of view. And, in fact, all tools I tested to display IPTC behaved like ExifTool.

Many thanks for your help.
Comment 12 Maik Qualmann 2025-07-23 08:53:26 UTC
Changing the character set in ExifTool would only affect the ExifTool metadata viewer. The actual IPTC metadata internally and in the IPTC metadata viewer would not change, as we use Exiv2 internally.

Maik