Bug 126025

Summary: [PATCH] Replying to address with umlaut and comma creates two addressees
Product: [Applications] kmail Reporter: tropikhajma <tropikhajma>
Component: mimeAssignee: Ingo Klöcker <kloecker>
Severity: normal CC: adam, andreaswuest, bernhard, clcevboxvjeo, coolo, lure, mh+kde-bugs, mueller, ojo, ovit.debian, timo, torsten.irlaender
Priority: NOR    
Version: SVN trunk (KDE 4)   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: patch against proko2

Description tropikhajma 2006-04-21 16:03:56 UTC
Version:            (using KDE KDE 3.5.2)
Installed from:    Mandriva RPMs
OS:                Linux

When the From: field contains name with quoted comma, such as
From: =?us-ascii?Q?Surname=2C=20Name?= <surname.name@domain.com>

the quoted comma is interpreted as address separator and when one tries to reply to such email, kmail sends it to two addresses (one of them fortunately correct)

I thought RFC2822 talks about comma as such, not when quoted??
Comment 1 Thiago Macieira 2006-04-24 09:26:51 UTC
Commas have to be quoted, yes. It looks like a bug in the mailer software, not in KMail.
Comment 2 tropikhajma 2006-04-24 21:49:27 UTC
I played with it a bit and found out that an email sent by Kmail through smtp with "from" header
From: "aaa, bbb" <name@domain.com>
will arrive with "from" header
From: =?us-ascii?Q?aaa=2C=20bbb?= <name@domain.com>

so the missing quoting may be a bug of the smtp program (sendmail) ??

Furthermore, when trying to setup the "from" field (from within gui) without the double quotes, but containing comma, the Kmail does not use this value at all and the "from" header contains default value instead. I would expect at least an error message here. Should I file a bug for this?
Comment 3 Thiago Macieira 2006-04-25 00:45:04 UTC
Check what the email saved in your sent-mail folder has. That's what KMail sent to your SMTP server. If that's different than what was received, then it was changed somewhere along the line.
Comment 4 tropikhajma 2006-04-27 00:15:24 UTC
I've checked it with ethereal and it really seems to be a fault of the smtp server or something behind it.
Comment 5 Bernhard E. Reiter 2006-04-28 18:25:35 UTC
I am am reopening the bug, because I think the =?us-ascii?Q?Surname=2C=20Name?=
encoding already is a phrase, thus there cannot be further encoding around it.

This makes the behaviour a failure with major severity
as this affect interoperability with other email clients that send
this correct encoding (and have no other choice with non-ascii names).

Here the reference to the interpretation:
rfc2882 (proposed standard)
  3.2.6. Miscellaneous tokens
  word            =       atom / quoted-string

  phrase          =       1*word / obs-phrase
  3.4. Address Specification
  mailbox         =       name-addr / addr-spec

  name-addr       =       [display-name] angle-addr 
  display-name    =       phrase  

rfc2047 (draft standard) 
  5. Use of encoded-words in message headers
  (3) As a replacement for a 'word' entity within a 'phrase', for example,
    one that precedes an address in a From, To, or Cc header.  The ABNF
    definition for 'phrase' from RFC 822 thus becomes:

    phrase = 1*( encoded-word / word )

As you can see from the syntax definition above:
phrase can have several words and encoded words.
But only words can be quoted-strings with DQUOTE (") characters around them.
encoded-words MUST not be enclosed by DQUOTE characters.

Comment 6 Till Adam 2006-05-03 18:26:21 UTC
but that section 3 in 2047 goes on:

    In this case the set of characters that may be used in a "Q"-encoded
    'encoded-word' is restricted to: <upper and lower case ASCII
    letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
    (underscore, ASCII 95.)>.  An 'encoded-word' that appears within a
    'phrase' MUST be separated from any adjacent 'word', 'text' or
    'special' by 'linear-white-space'.

That excludes ",", does it not?
Comment 7 Bernhard E. Reiter 2006-05-06 09:46:38 UTC
|     the set of characters that may be used in a "Q"-encoded
|     'encoded-word' is restricted to
| That excludes ",", does it not?

It does exclude "," in the encoded-word,
but not in the word to be encoded (= the word you get, when you decode).
Exclusion of "," means that you must encode this character
if it is in the string (to be encoded).
Comment 8 Ingo Klöcker 2006-05-09 15:34:01 UTC
This is definitely a bug in KMail, even though it's also IMO an abuse of RFC2047-encoding which was invented for encoding non-ASCII characters and not for encoding commas. The latter is what quoted strings have been invented for in RFC822.

The problem here is that KMail treats RFC2047-encoding more or less as transport encoding. All (or at least most) operations are done with the decoded header values. This is obviously based on a wrong assumption, namely that the RFC2047-decoder transforms a valid email address into a valid email address (except for the fact, that it may contain non-ASCII characters). The problem is that the RFC2047 decoder doesn't know anything about email address syntax. A possible solution would be to extend normalizeAddressesAndDecodeIDNs() (or write a similar function) so that it accepts RFC2047 encoded addresses and, during normalization, makes sure the display name is correctly quoted, whenever necessary. In fact, normalizedAddress() already adds quotes, whenever necessary. So, in normalizeAddressesAndDecodeIDNs() after KPIM::splitAddress( (*it).utf8(), displayName, addrSpec, comment ) the RFC2047 decoder has to be applied to displayName and comment. Of course, this means that we also have to pass the raw, possibly RFC2047-encoded, header value to normalizeAddressesAndDecodeIDNs().

(Yes, the correct solution would be to use KMime and it's email address class-tree for this, but KMime's email address parser only accepts ASCII-text (IIRC) and throws away any comments it encounters, so we can't use it atm.)
Comment 9 Till Adam 2006-05-10 19:50:50 UTC
Thanks for the comment Ingo, it matches what I had come up with myself, pretty much. Does the attached patch (against proko2, thus containing a few imports from libemailfunctions we didn't have before) look alright to you?
Comment 10 Till Adam 2006-05-10 19:52:25 UTC
Created attachment 16010 [details]
patch against proko2
Comment 11 Bernhard E. Reiter 2006-05-11 20:54:36 UTC
BTW i am pretty sure that this is no abuse,
if there is a comma and a non-ascii char in the realname string,
you can only use encoded-word from RFC2047, otherwise it would not
be fitting the grammer in rfc822 or rfc2822 anymore.
Comment 12 Bernhard E. Reiter 2006-07-31 19:44:13 UTC
Ping Ingo, did this patch go in?
Comment 13 Ingo Klöcker 2006-08-01 00:25:53 UTC
It's not that easy to apply to KDE 3.5 because KMMsgBase::decodeRFC2047String() can't simply be moved to libemailfunctions.
Comment 14 Thomas McGuire 2007-03-11 14:25:34 UTC
*** Bug 142810 has been marked as a duplicate of this bug. ***
Comment 15 Timo Weingärtner 2008-03-01 23:12:03 UTC
The note in RFC 2047, 6.2 makes this more clear:
>NOTE: Decoding and display of encoded-words occurs *after* a
>   structured field body is parsed into tokens.  It is therefore
>   possible to hide 'special' characters in encoded-words which, when
>   displayed, will be indistinguishable from 'special' characters in the
>   surrounding text.  For this and other reasons, it is NOT generally
>   possible to translate a message header containing 'encoded-word's to
>   an unencoded form which can be parsed by an RFC 822 mail reader.
Comment 16 Thomas McGuire 2008-03-02 13:48:30 UTC
*** Bug 137984 has been marked as a duplicate of this bug. ***
Comment 17 Thomas McGuire 2008-03-02 13:50:50 UTC
*** Bug 145508 has been marked as a duplicate of this bug. ***
Comment 18 Thomas McGuire 2008-04-14 00:42:55 UTC
*** Bug 160357 has been marked as a duplicate of this bug. ***
Comment 19 Kevin Ottens 2008-04-23 17:24:21 UTC
SVN commit 800168 by ervin:

Apply the RFC2047 decoding inside of normalizeAddressesAndDecodeIDNs()
as advised by Ingo. Use the RFC2047 implementation of kmime for that
matter (yes, we have at least three implementations of this rfc in

That fixes 126025 in the enterprise branch (forward port on trunk to

CCBUG: 126025

 M  +16 -6     kmail/kmmessage.cpp  
 M  +1 -1      libemailfunctions/Makefile.am  
 M  +5 -1      libemailfunctions/email.cpp  
 M  +1 -1      libemailfunctions/tests/Makefile.am  
 M  +11 -0     libemailfunctions/tests/testemail.cpp  
 M  +1 -0      libkcal/Makefile.am  
 M  +5 -0      libkmime/kmime_util.cpp  
 M  +7 -0      libkmime/kmime_util.h  

WebSVN link: http://websvn.kde.org/?view=rev&revision=800168
Comment 20 Kevin Ottens 2008-04-23 18:03:41 UTC
SVN commit 800178 by ervin:

Apply the RFC2047 decoding inside of normalizeAddressesAndDecodeIdn() as
advised by Ingo. Use the RFC2047 implementation of kmime for that matter
(yes, we have at least three implementations of this rfc in kdepim).

Forwardport of r800168 for kdepimlibs.

CCBUG: 126025

 M  +6 -0      kmime/kmime_util.cpp  
 M  +8 -0      kmime/kmime_util.h  
 M  +2 -1      kpimutils/CMakeLists.txt  
 M  +5 -1      kpimutils/email.cpp  
 M  +11 -0     kpimutils/tests/testemail.cpp  

WebSVN link: http://websvn.kde.org/?view=rev&revision=800178
Comment 21 Kevin Ottens 2008-04-23 18:06:23 UTC
SVN commit 800182 by ervin:

Since the RFC2047 decoding is now done in normalizeAddressesAndDecodeIdn(),
use the raw headers for the relevant addresses related fields in KMMessage.

Forwardport of 800168 for kdepim.

That fixes 126025 in trunk.

BUG: 126025

 M  +16 -6     kmmessage.cpp  

WebSVN link: http://websvn.kde.org/?view=rev&revision=800182
Comment 22 Thomas McGuire 2008-07-14 17:33:29 UTC
*** Bug 166550 has been marked as a duplicate of this bug. ***