476480 – KCodecs/KEmailAddress replaces all spaces in sender and recipient names with ASCII space

Bug 476480 - KCodecs/KEmailAddress replaces all spaces in sender and recipient names with ASCII space

Summary: KCodecs/KEmailAddress replaces all spaces in sender and recipient names with ...

Status:	RESOLVED FIXED

Alias:	None

Product:	frameworks-kcodecs
Classification:	Frameworks and Libraries
Component:	general (show other bugs)
Version:	5.103.0
Platform:	Debian unstable Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	kdelibs bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-11-02 20:49 UTC by Erin Yuki Schlarb
Modified:	2023-11-25 19:27 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Erin Yuki Schlarb 2023-11-02 20:49:49 UTC

SUMMARY
When changing my identity settings KMail settings to make use of U+2004 (THREE-PER-EM SPACE / thick space) and U+2009 (THIN SPACE) to visually separate the parts of my name I noticed that KMail shows these correctly in settings and composer, but the actually sent messages instead use U+0020 (ASCII SPACE). While I have not verified this in the source code I’m almost certain this is due to KMail/Akonadi/KMailTransport applying Unicode compatibility decomposition (NFKC normalization) before encoding the sender and recipient names.

GitHub at least doesn’t seem to have any problems encoding these correctly and they are displayed correctly in KMail in messages received from it. Python’s `email.header` appears to encode it basically the same way.

In particular for the name “First Collective Last” (using the mentioned U+2004 and U+2009), the three tested implements yield:
  1. KMail/Akonadi/KMailTransport: “From: First Collective Last <email-address>”  (using just ASCII spaces everywhere)
  2. GitHub: “From: =?UTF-8?b?Rmlyc3TigIRDb2xsZWN0aXZl4oCJTGFzdA==?= <notifications@github.com>”
  3. Python `email.header`: “From: =?utf-8?b?Rmlyc3TigIRDb2xsZWN0aXZl4oCJTGFzdA==?= <email-address>“

STEPS TO REPRODUCE
1. Compose an email to “First Collective Last <some-email-address-of-yours>” (copy-paste it!) in KMail – replacing the address as intended

OBSERVED RESULT
The received message has the fancy spaces replaced with the plain ASCII ones and hence contains a “To: First Collective Last <some-email-address-of-yours>” as a header, discarding the extra information originally entered

EXPECTED RESULT
The fancy quotes should remain and be encoded instead.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Debian unstable
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.104.0
Qt Version: 5.15.8

Comment 1 Erin Yuki Schlarb 2023-11-02 21:03:17 UTC

Just to avoid any potential misunderstandings I’d also like to point out that Unicode “compatibility” normalizations are not about “improving compatibility with software” but to make “compatible characters” equivalent during machine processing (typically full-text searches). I’m adding this specifically because I know e-mail is basically a giant compatibility hack and removing anything “cf compatibility“ may be met with an immediate “we can’t, it might break something”.

Comment 2 Erin Yuki Schlarb 2023-11-07 00:17:50 UTC

After tracing the whole email handling code from KMailTransport backwards to KMessageComposer, I found that the issue has nothing to do with Unicode normalization and is instead a simple `QString::simplified()` call inside the `KEmailAddress::splitAddressList` function, which is called by `KEmailAddress::normalizeAddressesAndEncodeIdn` which is unconditionally used by KMessageComposer for outgoing messages.

I understand that that call to `QString::simplified()` is there to remove whitespace around the individual address entries in address lists of the form `Name <email>, Name 2 <email2>` (ie: `Name <email>` + `Name2 <email2>` instead of `Name <email>` + ` Name2 <email2>`). Changing all calls from `QString::simplified()` to `QString::trimmed()` in that function still makes that work without doing any other changes to the plain text and fixes this bug. (I just tested this with KMail!)

Comment 3 Erin Yuki Schlarb 2023-11-25 19:27:43 UTC

Fixed this issue in main: https://invent.kde.org/frameworks/kcodecs/-/merge_requests/43