SUMMARY When changing my identity settings KMail settings to make use of U+2004 (THREE-PER-EM SPACE / thick space) and U+2009 (THIN SPACE) to visually separate the parts of my name I noticed that KMail shows these correctly in settings and composer, but the actually sent messages instead use U+0020 (ASCII SPACE). While I have not verified this in the source code I’m almost certain this is due to KMail/Akonadi/KMailTransport applying Unicode compatibility decomposition (NFKC normalization) before encoding the sender and recipient names. GitHub at least doesn’t seem to have any problems encoding these correctly and they are displayed correctly in KMail in messages received from it. Python’s `email.header` appears to encode it basically the same way. In particular for the name “First Collective Last” (using the mentioned U+2004 and U+2009), the three tested implements yield: 1. KMail/Akonadi/KMailTransport: “From: First Collective Last <email-address>” (using just ASCII spaces everywhere) 2. GitHub: “From: =?UTF-8?b?Rmlyc3TigIRDb2xsZWN0aXZl4oCJTGFzdA==?= <notifications@github.com>” 3. Python `email.header`: “From: =?utf-8?b?Rmlyc3TigIRDb2xsZWN0aXZl4oCJTGFzdA==?= <email-address>“ STEPS TO REPRODUCE 1. Compose an email to “First Collective Last <some-email-address-of-yours>” (copy-paste it!) in KMail – replacing the address as intended OBSERVED RESULT The received message has the fancy spaces replaced with the plain ASCII ones and hence contains a “To: First Collective Last <some-email-address-of-yours>” as a header, discarding the extra information originally entered EXPECTED RESULT The fancy quotes should remain and be encoded instead. SOFTWARE/OS VERSIONS Linux/KDE Plasma: Debian unstable KDE Plasma Version: 5.27.5 KDE Frameworks Version: 5.104.0 Qt Version: 5.15.8
Just to avoid any potential misunderstandings I’d also like to point out that Unicode “compatibility” normalizations are not about “improving compatibility with software” but to make “compatible characters” equivalent during machine processing (typically full-text searches). I’m adding this specifically because I know e-mail is basically a giant compatibility hack and removing anything “cf compatibility“ may be met with an immediate “we can’t, it might break something”.
After tracing the whole email handling code from KMailTransport backwards to KMessageComposer, I found that the issue has nothing to do with Unicode normalization and is instead a simple `QString::simplified()` call inside the `KEmailAddress::splitAddressList` function, which is called by `KEmailAddress::normalizeAddressesAndEncodeIdn` which is unconditionally used by KMessageComposer for outgoing messages. I understand that that call to `QString::simplified()` is there to remove whitespace around the individual address entries in address lists of the form `Name <email>, Name 2 <email2>` (ie: `Name <email>` + `Name2 <email2>` instead of `Name <email>` + ` Name2 <email2>`). Changing all calls from `QString::simplified()` to `QString::trimmed()` in that function still makes that work without doing any other changes to the plain text and fixes this bug. (I just tested this with KMail!)
Fixed this issue in main: https://invent.kde.org/frameworks/kcodecs/-/merge_requests/43