Bug 426000 - Please allow Unicode-Strings in EXIF metadata
Summary: Please allow Unicode-Strings in EXIF metadata
Status: REPORTED
Alias: None
Product: digikam
Classification: Applications
Component: Metadata-Exif (show other bugs)
Version: 7.0.0
Platform: Microsoft Windows Microsoft Windows
: NOR wishlist
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-30 18:09 UTC by herb
Modified: 2023-11-22 07:04 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
This image is the original image generated by Stable Diffusion, UserComment is displayed as garbled in Exif and normal in ExifTool. And Digikam gets stuck after displaying the metadata! (215.04 KB, image/jpeg)
2023-07-30 06:24 UTC, Rex Lee
Details

Note You need to log in before you can comment on or make changes to this bug.
Description herb 2020-08-30 18:09:13 UTC
SUMMARY
With bug #370558 I requested that digikam allows to enter Unicode-Strings in IPTC-IIM metadata fields.
This bug is in state: corrected with version 7.1.0.

Because the MWG (Metadata Working Group) requests not only to synchronize metadata between IPTC-IIM and XMP, but also between EXIF and XMP metadata, Digikam should also allow to enter Unicode-Strings to EXIF metadata.  


STEPS TO REPRODUCE
1. 
2. 
3. 

OBSERVED RESULT


EXPECTED RESULT
It would be wonderful to see this enhancement also in Digikam 7.1.0.


SOFTWARE/OS VERSIONS
Windows: 10
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
I think this does not depend on any OS or version.
Comment 1 caulier.gilles 2020-08-30 20:44:42 UTC
Herb,

I double check and Exif.Photo.UserComment is the only one official tag from Exif paper which support UTF8 encoding using charset="unicode" prefix.

digiKam support since a while UTF8 encoding for this tag. The difference with 7.1.0 is for IPTC which support UTF8 now, a sync from IPTC caption to Exif UserComment will be done without lose data.

Fix me if i'm wrong if i forget another tag from IPTC to backport to Exif with UTF8 support.

Best

Gilles Caulier
Comment 2 caulier.gilles 2020-08-30 20:55:43 UTC
These non official Exif tags used by Windows encode strings as UTF16 :

0x9c9b 	40091 	Image 	Exif.Image.XPTitle 	Byte 	Title tag used by Windows, encoded in UCS2
0x9c9c 	40092 	Image 	Exif.Image.XPComment 	Byte 	Comment tag used by Windows, encoded in UCS2
0x9c9d 	40093 	Image 	Exif.Image.XPAuthor 	Byte 	Author tag used by Windows, encoded in UCS2
0x9c9e 	40094 	Image 	Exif.Image.XPKeywords 	Byte 	Keywords tag used by Windows, encoded in UCS2
0x9c9f 	40095 	Image 	Exif.Image.XPSubject 	Byte 	Subject tag used by Windows, encoded in UCS2

I'm not sure if digiKam support these tags for interoperability...

Gilles Caulier
Comment 3 caulier.gilles 2020-08-31 02:20:23 UTC
These Exif tags are also encoded as Exif.Photo.UserComment, so using UTF8:

0x001b 	27 	GPSInfo 	Exif.GPSInfo.GPSProcessingMethod 	Comment 	A character string recording the name of the method used for location finding. The string encoding is defined using the same scheme as UserComment.
0x001c 	28 	GPSInfo 	Exif.GPSInfo.GPSAreaInformation 	Comment 	A character string recording the name of the GPS area.The string encoding is defined using the same scheme as UserComment.

Both must be managed while a sync from XMP to Exif.
Comment 4 caulier.gilles 2020-08-31 02:56:05 UTC
For all Exif.Image.XP* tags, there are read only. Look the story here:

https://bugs.kde.org/show_bug.cgi?id=421464
https://dev.exiv2.org/boards/3/topics/530

Gilles Caulier
Comment 5 caulier.gilles 2020-08-31 03:06:27 UTC
Other non standards Exif tags are encoded in UTF8:

0xc6f3 	50931 	Image 	Exif.Image.CameraCalibrationSignature 	Byte 	A UTF-8 encoded string associated with the CameraCalibration1 and CameraCalibration2 tags. The CameraCalibration1 and CameraCalibration2 tags should only be used in the DNG color transform if the string stored in the CameraCalibrationSignature tag exactly matches the string stored in the ProfileCalibrationSignature tag for the selected camera profile.

0xc6f4 	50932 	Image 	Exif.Image.ProfileCalibrationSignature 	Byte 	A UTF-8 encoded string associated with the camera profile tags. The CameraCalibration1 and CameraCalibration2 tags should only be used in the DNG color transfer if the string stored in the CameraCalibrationSignature tag exactly matches the string stored in the ProfileCalibrationSignature tag for the selected camera profile.

0xc6f6 	50934 	Image 	Exif.Image.AsShotProfileName 	Byte 	A UTF-8 encoded string containing the name of the "as shot" camera profile, if any.

0xc6fe 	50942 	Image 	Exif.Image.ProfileCopyright 	Byte 	A UTF-8 encoded string containing the copyright information for the camera profile. This string always should be preserved along with the other camera profile tags.

0xc716 	50966 	Image 	Exif.Image.PreviewApplicationName 	Byte 	A UTF-8 encoded string containing the name of the application that created the preview stored in the IFD.

0xc717 	50967 	Image 	Exif.Image.PreviewApplicationVersion 	Byte 	A UTF-8 encoded string containing the version number of the application that created the preview stored in the IFD.

0xc718 	50968 	Image 	Exif.Image.PreviewSettingsName 	Byte 	A UTF-8 encoded string containing the name of the conversion settings (for example, snapshot name) used for the preview stored in the IFD.

All these tags are used in specific case as RAW processing or DNG conversion. There are not managed by digiKam.

Gilles Caulier
Comment 6 herb 2020-08-31 08:47:06 UTC
Hello,

thanks for your detailed analysis.

My request is based on the following:
1) MWG requests in their document "Guidelines for Handling Image Metadata" version 2 November 2010
http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
on page 32:

Exif tags documented in the Exif specification as type ASCII SHOULD be written as UTF-8. Note that 7-bit ASCII is a proper subset of UTF-8. They MAY be written as 7-bit ASCII, with appropriate trimming for out of range bytes. These tags MUST NOT be written in some other encoding.

Therefore it is clear to me, it should be possible to read and write EXIF-metadata in Unicode.
Tools that I know, like ExifTool, IMatch or XnViewMP have this feature.
For me it is standard in such systems.

2) MWG requests that EXIF-metadata and XMP-metadata should be synchronized.
Some example-tags are:
 XMP-dc:Description <--> EXIF:ImageDescription
 XMP-dc:Rights <--> EXIF:Copyright
 XMP-dc:Creator <--> EXIF:Artist

As all this XMP-metadata will be UTF-8 strings, also the corresponding EXIF-strings will be in UTF-8

3) The encoding of the EXIF-metadata should be:
- UTF-8 for all strings except
- UserComment which is to be encoded in UCS-2 and
  some XPxxx tags.
As ASCII is fully part of UTF-8 I see no problem to handle current ASCII-strings.

In particular with ExifTool the user can select the encoding of the EXIF-metadata; but this is not part of my request.

Thanks and best regards
herb
Comment 7 caulier.gilles 2020-08-31 10:42:49 UTC
My response and my resume.

1/ This is already the cases on whole digiKam metadata engine code since a while. We try to be compliant with MWG paper since the start.

2/
 XMP-dc:Description <--> EXIF:ImageDescription => already Done
 XMP-dc:Rights <--> EXIF:Copyright             => sync not yet done in MetadataEditor
 XMP-dc:Creator <--> EXIF:Artist               => sync not yet done in MetadataEditor

3/ Same than 1/

4/ Following my comment #3

Add UTF8 writing support for these Exif tags:

Exif.GPSInfo.GPSProcessingMethod
Exif.GPSInfo.GPSAreaInformation

If i'm not too wrong, both can be used to get or store literal reverse geo-coding information, but i'm not yet totally sure. At least for the moment, digiKam do not play with these data, and there is nothing to do excepted that i can prepare setter and getter in code.

5/ Windows XP* tags must be managed in read only. So there is nothing to do.

Gilles Caulier
Comment 8 caulier.gilles 2020-08-31 10:47:45 UTC
So for 7.1.0, it still point 2/ todo :

Note that Exif.Image.Copyright and Exif.Image.Artist do not support UTF8 => ASCII only.

For the point 4/, another file in bugzilla must be open to support these tag with Reverse Geocoding feature, if and only if it's have an interest for end users.

Gilles Caulier
Comment 9 herb 2020-08-31 11:47:54 UTC
Hello,

thanks for your comments.

For me this leads to the question: Should Digikam follow 
(1) only the EXIF-standard:
In this case you are right, many tags (e.g. also artist) allow only ASCII   characters.

(2) also the MWG requirements:
As said many other systems do this. For me each "up to date" system should do this.
In this case it is a must to support UTF-8 for EXIF-tags.
And this UTF-8 support is independent on whether Digikam synchronizes tags as requested by MWG automatically.

IMatch e.g. has a global option which defines the rules (1) or (2).


(3) In case of your decision will be to follow strict the old fashioned EXIF standard and you do not allow to write UTF-8 strings for EXIF tags,
then please make it possible that UTF-8 strings will be displayed properly.
Only then for images tagged by other systems (that have written UTF-8 to EXIF-tags) all tags will be displayed properly.

Still hoping that Digikam will also write UTF-8 for EXIF-tags.
Best regards
herb
Comment 10 caulier.gilles 2020-08-31 12:32:42 UTC
yes digiKam must follow by default the whole Exif standard. The Exif Metadata editor (not the viewer) only propose the Standard Exif tags.

For all extra Exif tags, the changes are done case by case. Remember that non standard tags are written by camera maker or applications for specific uses. And i don't talk about makernotes tags. Both are non documented and contents is discovered using reverse-engineering.

Gilles Caulier
Comment 11 caulier.gilles 2020-08-31 12:34:01 UTC
Git commit 88fb5093407ad0582b0945410486cf43f5fc01d6 by Gilles Caulier.
Committed on 31/08/2020 at 12:33.
Pushed by cgilles into branch 'master'.

MEtadat editor : sync Exif Copyright with XMP Copyright

M  +1    -1    core/dplugins/generic/metadata/metadataedit/iptc/iptccontent.cpp
M  +1    -1    core/dplugins/generic/metadata/metadataedit/iptc/iptcorigin.cpp
M  +64   -12   core/dplugins/generic/metadata/metadataedit/xmp/xmpcontent.cpp
M  +8    -4    core/dplugins/generic/metadata/metadataedit/xmp/xmpcontent.h
M  +15   -13   core/dplugins/generic/metadata/metadataedit/xmp/xmpeditwidget.cpp
M  +6    -6    core/libs/template/templatepanel.cpp
M  +3    -3    core/libs/widgets/metadata/subjectwidget.h
M  +1    -3    core/utilities/setup/setuptemplate.cpp

https://invent.kde.org/graphics/digikam/commit/88fb5093407ad0582b0945410486cf43f5fc01d6
Comment 12 caulier.gilles 2020-08-31 13:48:54 UTC
Git commit 3d73903c9077cb06e4a62c15b450559e3533c659 by Gilles Caulier.
Committed on 31/08/2020 at 13:46.
Pushed by cgilles into branch 'master'.

Metadata editor : sync Exif Artist with Xmp dc creator.

M  +74   -32   core/dplugins/generic/metadata/metadataedit/xmp/xmpcredits.cpp
M  +6    -1    core/dplugins/generic/metadata/metadataedit/xmp/xmpcredits.h
M  +3    -1    core/dplugins/generic/metadata/metadataedit/xmp/xmpeditwidget.cpp

https://invent.kde.org/graphics/digikam/commit/3d73903c9077cb06e4a62c15b450559e3533c659
Comment 13 herb 2022-05-21 15:42:40 UTC
Dear developers,

please allow UTF-8 strings inside EXIF metadata. I repeat my request from August 2020.

Also IPTC.org does request this feature. Please read IPTC Photo Metadata Mapping Guidelines (version 2022.1, 2022-05-17) at 
https://iptc.org/std/photometadata/documentation/mappingguidelines/

They have requested this at meeting: IPTC announces rules for mapping photo metadata between IPTC, Exif and schema.org standards from 2022-03-04 (https://iptc.org/news/iptc-announces-rules-for-mapping-photo-metadata-between-iptc-exif-and-schema-org-standards/).

Best regards
herb
Comment 14 Maik Qualmann 2022-05-30 18:11:57 UTC
I see here that we are already writing UTF8 to Exif when sync is enabled and we are making entries in the XMP tab in the metadata editor. Is it also about, for example, that only ASCII is allowed for input in the Exif tab?

I would make the following suggestion, we allow the input of UTF8 since we are already writing UTF8 when syncronizing with XMP. We make it visible when there is UTF8 in the fields but the standard actually requires ASCII, for example with an error icon behind the input field.

Maik
Comment 15 herb 2022-05-30 20:18:45 UTC
Hello Maik,

thanks for your suggestion.
Please be aware that digiKam is the only metadata system (I know) that tries to follow EXIF-standrad to 100%.
I think it is not necessary to show a special flag for UTF8-strings.
But in case you really show such a flag, please do not use an "error flag/icon". Use something neutral.

Best regards
herb
Comment 16 Maik Qualmann 2022-06-01 21:11:42 UTC
Git commit 59d07f9e9566f3befb3259ba9ab7d198be3c1d3c by Maik Qualmann.
Committed on 01/06/2022 at 21:10.
Pushed by mqualmann into branch 'master'.

enable unicode in the Exif caption tab and show ASCII state with an icon

M  +81   -30   core/dplugins/generic/metadata/metadataedit/exif/exifcaption.cpp
M  +11   -0    core/dplugins/generic/metadata/metadataedit/exif/exifcaption.h

https://invent.kde.org/graphics/digikam/commit/59d07f9e9566f3befb3259ba9ab7d198be3c1d3c
Comment 17 caulier.gilles 2023-04-30 02:36:09 UTC
Hi all,

digiKam 8.0.0 is out. This entry still valid with this release ?

Best regards

Gilles Caulier
Comment 18 caulier.gilles 2023-05-20 12:20:50 UTC
@herb

digiKam 8.1.0 pre-release Windows bundle is now ported to Exiv2 0.28 which
come with a huge list of bugfixes :

https://github.com/Exiv2/exiv2/issues/2406#issuecomment-1529139799

Installer file is available here :

https://files.kde.org/digikam/

In case of Exiv2 bug fixed with this version, please give us a feedback.

Thanks in advance

Gilles Caulier
Comment 19 Rex Lee 2023-07-30 06:24:22 UTC
Created attachment 160619 [details]
This image is the original image generated by Stable Diffusion, UserComment is displayed as garbled in Exif and normal in ExifTool. And Digikam gets stuck after displaying the metadata!

The images generated by stable diffusion come with a message that is prompted at the time of generation, which is stored in the image's metadata.
When I use it to view the information about the images generated by stable diffusion, the UserComment under Exit is garbled, while in ExitTool it is displayed normally, and the software becomes very laggy and can only be recovered by restarting.
I set different options but none of them work. I hope you guys can fix this bug or tell me how to set it up. More and more people are using Stable diffusion and many need to manage the large number of images generated, which I think is important.
Comment 20 Maik Qualmann 2023-07-30 11:42:15 UTC
Git commit 4472657499fb213cf046440fe6797f9ce344c74d by Maik Qualmann.
Committed on 30/07/2023 at 13:39.
Pushed by mqualmann into branch 'master'.

support UTF16 Exif comment with >= Qt-5.15.0

M  +19   -3    core/libs/metadataengine/engine/metaengine_p.cpp

https://invent.kde.org/graphics/digikam/-/commit/4472657499fb213cf046440fe6797f9ce344c74d
Comment 21 Maik Qualmann 2023-07-30 12:31:52 UTC
@Rex Lee
You should open a bug report for Stable Diffusion. Encoding the Exif comment in UTF-16 does not conform to the Exif standard. The Exif standard strongly recommends UTF-8 here.

Maik
Comment 22 caulier.gilles 2023-10-15 10:45:20 UTC
@Rex Lee,


This problem still reproducible with the new digiKam 8.2.0 pre-release Windows
installer available at usual place:

https://files.kde.org/digikam/

This new bundle is based on last Qt framework 5.15.11 and KDE framework 5.110.

Thanks in advance

Gilles Caulier
Comment 23 Maik Qualmann 2023-11-22 07:03:37 UTC
Hi Gilles,

I fixed the issue with the user comment by supporting UTF16. But the condition is Qt >= 5.15. I think we can close here.

Maik
Comment 24 Maik Qualmann 2023-11-22 07:04:22 UTC
Wrong Comment 23 here.

Maik