Bug 476480

Summary: KCodecs/KEmailAddress replaces all spaces in sender and recipient names with ASCII space
Product: [Frameworks and Libraries] frameworks-kcodecs Reporter: Erin Yuki Schlarb <erin-kde>
Component: generalAssignee: kdelibs bugs <kdelibs-bugs>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 5.103.0   
Target Milestone: ---   
Platform: Debian unstable   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Erin Yuki Schlarb 2023-11-02 20:49:49 UTC
SUMMARY
When changing my identity settings KMail settings to make use of U+2004 (THREE-PER-EM SPACE / thick space) and U+2009 (THIN SPACE) to visually separate the parts of my name I noticed that KMail shows these correctly in settings and composer, but the actually sent messages instead use U+0020 (ASCII SPACE). While I have not verified this in the source code I’m almost certain this is due to KMail/Akonadi/KMailTransport applying Unicode compatibility decomposition (NFKC normalization) before encoding the sender and recipient names.

GitHub at least doesn’t seem to have any problems encoding these correctly and they are displayed correctly in KMail in messages received from it. Python’s `email.header` appears to encode it basically the same way.

In particular for the name “First Collective Last” (using the mentioned U+2004 and U+2009), the three tested implements yield:
  1. KMail/Akonadi/KMailTransport: “From: First Collective Last <email-address>”  (using just ASCII spaces everywhere)
  2. GitHub: “From: =?UTF-8?b?Rmlyc3TigIRDb2xsZWN0aXZl4oCJTGFzdA==?= <notifications@github.com>”
  3. Python `email.header`: “From: =?utf-8?b?Rmlyc3TigIRDb2xsZWN0aXZl4oCJTGFzdA==?= <email-address>“

STEPS TO REPRODUCE
1. Compose an email to “First Collective Last <some-email-address-of-yours>” (copy-paste it!) in KMail – replacing the address as intended

OBSERVED RESULT
The received message has the fancy spaces replaced with the plain ASCII ones and hence contains a “To: First Collective Last <some-email-address-of-yours>” as a header, discarding the extra information originally entered

EXPECTED RESULT
The fancy quotes should remain and be encoded instead.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Debian unstable
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.104.0
Qt Version: 5.15.8
Comment 1 Erin Yuki Schlarb 2023-11-02 21:03:17 UTC
Just to avoid any potential misunderstandings I’d also like to point out that Unicode “compatibility” normalizations are not about “improving compatibility with software” but to make “compatible characters” equivalent during machine processing (typically full-text searches). I’m adding this specifically because I know e-mail is basically a giant compatibility hack and removing anything “cf compatibility“ may be met with an immediate “we can’t, it might break something”.
Comment 2 Erin Yuki Schlarb 2023-11-07 00:17:50 UTC
After tracing the whole email handling code from KMailTransport backwards to KMessageComposer, I found that the issue has nothing to do with Unicode normalization and is instead a simple `QString::simplified()` call inside the `KEmailAddress::splitAddressList` function, which is called by `KEmailAddress::normalizeAddressesAndEncodeIdn` which is unconditionally used by KMessageComposer for outgoing messages.

I understand that that call to `QString::simplified()` is there to remove whitespace around the individual address entries in address lists of the form `Name <email>, Name 2 <email2>` (ie: `Name <email>` + `Name2 <email2>` instead of `Name <email>` + ` Name2 <email2>`). Changing all calls from `QString::simplified()` to `QString::trimmed()` in that function still makes that work without doing any other changes to the plain text and fixes this bug. (I just tested this with KMail!)
Comment 3 Erin Yuki Schlarb 2023-11-25 19:27:43 UTC
Fixed this issue in main: https://invent.kde.org/frameworks/kcodecs/-/merge_requests/43