Summary: | Message preview pane character encoding issue (utf-8, unicode) | ||
---|---|---|---|
Product: | [Applications] kmail2 | Reporter: | Wouter Van Hemel <wouter-kde> |
Component: | crypto | Assignee: | Sandro Knauß <sknauss> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | aheinecke, sknauss, t.glaser |
Priority: | NOR | ||
Version: | 4.14.2 | ||
Target Milestone: | --- | ||
Platform: | Debian unstable | ||
OS: | Linux | ||
Latest Commit: | http://commits.kde.org/messagelib/04334e2f8390b967fc5b1c4ecde8caacf4787238 | Version Fixed In: | 5.4.0 |
Sentry Crash Report: | |||
Attachments: |
Testcase
An encrypted ISO-8859-15 text |
Description
Wouter Van Hemel
2010-08-16 14:20:10 UTC
Confirmed, although I primarily notice the issue with encrypted eMails. 「V̲iew → S̲et Encoding → A̲uto」 in the menu shows the broken behaviour. 「V̲iew → S̲et Encoding → Unicode ( UTF̲-8 )」 manually forces the correct one. See also: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754265 which also lists the three versions in which I already noticed this bug (4.11 4.12 and 4.13 are affected, at least). Created attachment 88293 [details]
Testcase
By request from one of the Mozilla developers, I just made a testcase for this bug. I attached it here, as I can confirm it also triggers the bug in Kontact.
The passphrase for the PGP key I attached is 123123.
Thank you for taking the time to file a bug report. KMail2 was released in 2011, and the entire code base went through significant changes. We are currently in the process of porting to Qt5 and KF5. It is unlikely that these bugs are still valid in KMail2. We welcome you to try out KMail 2 with the KDE 4.14 release and give your feedback. (In reply to Laurent Montel from comment #3) > We welcome you to try out KMail 2 with the KDE 4.14 release and give your > feedback. RECONFIRMED: The bug still happens with kmail 4:4.14.2-2 (Debian unstable). Dude, what gives? It took me a minute to re-check that. Please reopen. The problem is that PGP inline is not a standard at all and it is not definded what charset is to use when displaying the mail. For kmail we said that the charset field in the mail indicate, what charset the decrypted message have. Because the encrypted part is everytime ascii. This is mostly that what the most email clients does. RFC 2440 only says: "Charset", a description of the character set that the plaintext is in. Please note that OpenPGP defines text to be in UTF-8 by default. An implementation will get best results by translating into and out of UTF-8. However, there are many instances where this is easier said than done. Also, there are communities of users who have no need for UTF-8 because they are all happy with a character set like ISO Latin-5 or a Japanese character set. In such instances, an implementation MAY override the UTF-8 default by using this header key. An implementation MAY implement this key and any translations it cares to; an implementation MAY ignore it and assume all text is UTF-8. -> The best would be if apine would have used the Armor Header Key "Charset". To not break the existing way, i would only switch to default utf8 decoding if the surrounding charset is ascii :) Because every ascii text is the same in utf8... (In reply to Sandro Knauß from comment #5) PGP Inline is perfectly fine standardised: the display agent has to use the charset indicated by the PGP message, and discard any charset/encoding information of the surrounding message. It works like this: Encode: secret message (charset/encoding A) → PGP → ASCII-armoured thing ⇒ MUA → MUA’s own encoding (charset B, possibly encoding Quoted-Printable) → RFC822 Decode: RFC822 → MUA decode (e.g. QP) → ASCII-armoured thing ⇒ PGP → secret message There are double-stroked arrows here, which means, someone can do this manually. For example, the standard/initial way of doing Inline PGP was to edit the message in $EDITOR, then throw it through PGP, then to paste the resulting .asc into the MUA editor. Reversely, to save the message (*after* MUA decoding! this is what some get wrong!) as .asc then to call PGP on it on the command line. > -> The best would be if apine would have used the Armor Header Key "Charset". Alpine has no concept of PGP. I write my messages by telling alpine to invoke an external editor (always, as I loathe pico) in which I type the message, then pipe it through e.g. “gpg --clearsign” or “gpg -seatr foo@bar.com”. PGP actually *does* have the “Charset” header in the ASCII armour. GnuPG just doesn’t write the header if it’s the default value, namely, UTF-8. (If I write the secret message in, say, latin1, and then tell GnuPG that it’s latin1, then the “Charset” header is there. But I use UTF-8 everywhere.) > To not break the existing way, i would only switch to default utf8 decoding if the surrounding charset is ascii :) Because every ascii text is the same in utf8... That is actually the correct fix for my scenario. The more broad correct fix is to do this, for Inline PGP: ① decode the RFC822 message using the MIME content-type, content-transfer-encoding ② if it has a “Charset” header in the PGP ASCII armour, note that down for later ③ decode the MIME-decoded message through GnuPG ④ use the charset noted down earlier, or UTF-8 if none, for displaying the armoured part (i.e. the part within the green and/or blue boxes; for anything outside of them, if any, keep using the MIME charset information; this is important for mixed content!) Thanks! > PGP Inline is perfectly fine standardised: the display agent has to use the charset indicated by the PGP > message, and discard any charset/encoding information of the surrounding message. No it's not. Especially the Encoding handling is very problematic and not standardised. See: https://debian-administration.org/users/dkg/weblog/108 ( https://dkg.fifthhorseman.net/notes/inline-pgp-harmful/ ) Basically your Mail says that it's ASCII Encoded but then actually has UTF-8 encoding in the content after decryption. I would argue that this is not a KMail bug but that your Mail is broken. For proper encoding Handling you need to use PGP/MIME. One of the Advantages of PGP/MIME is proper encoding handling. KMail uses the Content-Type charset of the PGP Message which would be correct. GnuPG / GPGME itself does not do any reencoding it just decrypts the "bytes" of the message. The Armor Header from RFC2440 is afaik not used in practice. As changing the encoding can change the meaning and the armor headers themself are not signed / encrypted this offers not much advantage over the Content-Type. Except that you would have an even more fragile implementation because you would have to handle mixed encodings in a message for multiple PGP/Message parts. And you would have to treat PGP Clearsigned messages differently,.. As a "workaround" / to improve compatibility with broken MUA's I like Sandro's idea to treat PGP Messages as UTF-8 if the specified Charset is 7Bit ASCII. I think that would be a good solution to fix your bug. Although I would suggest to use a proper MUA with PGP/MIME support. (In reply to Andre Heinecke from comment #7) > > PGP Inline is perfectly fine standardised: the display agent has to use the charset indicated by the PGP > > message, and discard any charset/encoding information of the surrounding message. > > No it's not. Especially the Encoding handling is very problematic and not > standardised. See: https://debian-administration.org/users/dkg/weblog/108 ( It is, and especially the encoding is trivial. It’s just often misunderstood or implemented wrong. Citing someone who doesn’t fully understand it doesn’t help (I knew that posting). Inline PGP is easy: the MIME-level encoding is valid for the “outer” part of the message; for example, if MIME says quoted-printable then those ‘=’ in the ASCII armour of the PGP message are encoded as “=3D”. The “inner” part of the message, i.e. the output of pgp/gpg decrypting it, is *completely* independent of the MIME message surrounding it, and for displaying it, *only* the rules that the command-line utilities use are valid; this means, that the OpenPGP-level encoding is used (which is always 8bit not quoted-printable or base64, and in absence of an explicit charset selection is UTF-8). The reason for this is easy: Inline PGP works, basically (i.e. without explicit MUA support), by someone writing a plaintext file, throwing that through pgp or gpg, and copy/pasting that into their MUA’s composer. Anything an MUA does to integrate Inline PGP support *must* behave *exactly the same*. > Basically your Mail says that it's ASCII Encoded but then actually has UTF-8 > encoding in the content after decryption. I would argue that this is not a See above, “after decryption” when Inline PGP is used means you *have* to *forget* anything from the previous container. Yes, this is different than what PGP/MIME requires. Yes, both are right, for their respective scopes. > KMail bug but that your Mail is broken. For proper encoding Handling you > need to use PGP/MIME. One of the Advantages of PGP/MIME is proper encoding This sounds half like a sales pitch, half like “KMail doesn’t handle encoding in Inline PGP correctly” – which is *exactly my point*. > GnuPG / GPGME itself does not do any reencoding it just decrypts the "bytes" > of the message. It does *record* the charset of the message. > As a "workaround" / to improve compatibility with broken MUA's I like > Sandro's idea to treat PGP Messages as UTF-8 if the specified Charset is > 7Bit ASCII. I think that would be a good solution to fix your bug. That would help in the specific case, but still leave KMail a broken MUA claiming to support Inline PGP and not doing it correctly. However, as a first step, it’s okay; please do so. Actually, why haven’t you done so yet… > Although I would suggest to use a proper MUA with PGP/MIME support. No, PGP/MIME often breaks, interestingly enough, with encoding-related issues, and with mailing lists. Its interoperability is also limited to MUAs supporting it, whereas interoperability of Inline PGP is maximal. (In reply to Thorsten Glaser from comment #8) > (In reply to Andre Heinecke from comment #7) > > > PGP Inline is perfectly fine standardised: the display agent has to use the charset indicated by the PGP > > > message, and discard any charset/encoding information of the surrounding message. > > > > No it's not. Especially the Encoding handling is very problematic and not > > standardised. See: https://debian-administration.org/users/dkg/weblog/108 ( > > It is, and especially the encoding is trivial. It’s just often misunderstood > or implemented wrong. > Citing someone who doesn’t fully understand it doesn’t help (I knew that > posting). dkg and andre know what about they are talking - search for references in the internet and what they do inside the openpg project. You will find a lot references to them. > > Inline PGP is easy: the MIME-level encoding is valid for the “outer” part of > the message; for > example, if MIME says quoted-printable then those ‘=’ in the ASCII armour of > the PGP message > are encoded as “=3D”. > In your comment you mix often differnent encodings. in the mail context we have two: - content-transfer-encoding - this is the encoding how the text (that is not ascii 7bit encoded) is modified to be 7bit. This is quoted-printablem base64 or plain. It is out of question, that we have first do decode this before entering the content. This is the "=3D" -> "=" the encoding of the text is more problematic :) We have one field, where we can set the encoing of the mimepart that is the content-type header for a mime part with the charset setting: Content-Type: text/plain; charset="UTF-8" the problem is now, that you are arguing, that gnupg have a defined in/output charset, so that we should ignore the charset setting of the mimepart after we piped the content through gnupg. But this is not true. gnupg only parsing bytestream and do charset handling at all. The only thing, is that gnupg suggest that you SHOULD use utf-8, but do not force this. It only works for you, because alpine is a cmdline mua, that puts it output to your console, and your console using utf-8 encoding, but if you would switch to something else, you couldn't read the text successfully. > The “inner” part of the message, i.e. the output of pgp/gpg decrypting it, is *completely* independent of the MIME message surrounding it, and for displaying it, *only* the rules that the command-line utilities use are valid; this means, that the OpenPGP-level encoding is used (which is always 8bit not quoted-printable or base64, and in absence of an explicit charset selection is UTF-8). Well, the problem is that there is no "OpenPGP-level encoding". There is no API to ask gnupg about the encoding ( if there would be a api Andre would know this, because he is one of the authors fof the gnupg apis :) . > The reason for this is easy: Inline PGP works, basically (i.e. without explicit MUA support), by someone writing a plaintext file, throwing that through pgp or gpg, and copy/pasting that into their MUA’s composer. Anything an MUA does to integrate Inline PGP support *must* behave *exactly the same*. Make the experiment - change the charset of you konsole/ and use a text document with a different encoding and encrypt it and look at the output in your normal console ( utf-8). You will see that this is broken. This all works for you because you have a consistent utf8 environment. But for mails we can't say, what is the encoding of the sender, we can only guess here. > > GnuPG / GPGME itself does not do any reencoding it just decrypts the "bytes" > > of the message. > > It does *record* the charset of the message. But maybe all are wrong and you are right - give me the link to the documentation or a script/snippset, how It detect the correct charset of the decrypted mail i'll fix this instantly in kmail. Okay here is my console test: % LANG=C luit -encoding ISO-8859-15 gpg --encrypt -a -o test.enc You did not specify a user ID. (you may use "-r") Current recipients: Enter the user ID. End with an empty line: 0x36FD5E35D1D8EFD2 gpg: 0x36FD5E35D1D8EFD2: There is no assurance this key belongs to the named user pub 1024R/0x36FD5E35D1D8EFD2 2014-08-18 Test for Mozilla bug#1054187 Primary key fingerprint: 8D15 3316 76F4 6081 1A99 DB56 36FD 5E35 D1D8 EFD2 It is NOT certain that the key belongs to the person named in the user ID. If you *really* know what you are doing, you may answer the next question with yes. Use this key anyway? (y/N) y Current recipients: 1024R/0x36FD5E35D1D8EFD2 2014-08-18 "Test for Mozilla bug#1054187" Enter the user ID. End with an empty line: test äöü test % LANG=C luit -encoding ISO-8859-15 gpg -d test.enc You need a passphrase to unlock the secret key for user: "Test for Mozilla bug#1054187" 1024-bit RSA key, ID 0x36FD5E35D1D8EFD2, created 2014-08-18 gpg: encrypted with 1024-bit RSA key, ID 0x36FD5E35D1D8EFD2, created 2014-08-18 "Test for Mozilla bug#1054187" test äöü test ^^ yeah that matches :D % LANG=C gpg -d test.enc You need a passphrase to unlock the secret key for user: "Test for Mozilla bug#1054187" 1024-bit RSA key, ID 0x36FD5E35D1D8EFD2, created 2014-08-18 gpg: encrypted with 1024-bit RSA key, ID 0x36FD5E35D1D8EFD2, created 2014-08-18 "Test for Mozilla bug#1054187" test test ^^ argh this is not what I enterted - and you see here, that gnupg on the commandline has no handling for encoding - it just using the default encoding of the console. The information, that the output has to be interpresed as ISO-8859-15 is lost. Created attachment 99676 [details]
An encrypted ISO-8859-15 text
Just for make it clear - my console is also by default utf-8 luit is a programm that translate from/to the encding that is specified. So within the command everything is like it is ISO-8859-15 input and output. (In reply to Sandro Knauß from comment #9) > Make the experiment - change the charset of you konsole/ and use a text > document with a different encoding and encrypt it and look at the output in > your normal console ( utf-8). You will see that this is broken. This all > works for you because you have a consistent utf8 environment. But for mails Possibly, but ISTR that OpenPGP still stores the encoding of the message, so I’d have a way to know what charset to pass to iconv(1) to be able to read it, and I’m not talking about the ASCII armour pseudo-header either. I’ll search for it when I have more time. > > > GnuPG / GPGME itself does not do any reencoding it just decrypts the "bytes" > > > of the message. > > > > It does *record* the charset of the message. > > But maybe all are wrong and you are right - give me the link to the > documentation or a script/snippset, how It detect the correct charset of the > decrypted mail i'll fix this instantly in kmail. OK. Btw. I've asked about armor headers as part of another issue regarding gpgme_data_identify and the maintainer of gnupg also says that they should not be used and are not used by gnupg: https://bugs.gnupg.org/gnupg/issue2314 Git commit 04334e2f8390b967fc5b1c4ecde8caacf4787238 by Sandro Knauß. Committed on 18/07/2016 at 07:49. Pushed by knauss into branch 'Applications/16.08'. Fix: Message with wrong charset MUAs sometimes fail to set the correct character encoding. If the set us-ascii, we can help a little bit by setting it to utf-8. Because utf-8 is a superset of us-ascii we do not break anything. FIXED-IN: 5.4.0 A +34 -0 mimetreeparser/autotests/data/openpgp-inline-wrong-charset-encrypted.mbox A +47 -0 mimetreeparser/autotests/data/openpgp-inline-wrong-charset-encrypted.mbox.html A +4 -0 mimetreeparser/autotests/data/openpgp-inline-wrong-charset-encrypted.mbox.tree M +8 -1 mimetreeparser/src/viewer/nodehelper.cpp http://commits.kde.org/messagelib/04334e2f8390b967fc5b1c4ecde8caacf4787238 |