Bug 95074

Summary: HTML encoding isn't detected
Product: [Unmaintained] kmail Reporter: Noam Raphael <noamraph>
Component: generalAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED UNMAINTAINED    
Severity: normal CC: antonio.merker, bjoern, jens-bugs.kde.org, m.wege, oded, tuju
Priority: NOR    
Version: 1.6.2   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: A mail message with an HTML content that isn't decoded properly

Description Noam Raphael 2004-12-13 16:29:49 UTC
Version:           1.6.2 (using KDE 3.2.3,  (testing/unstable))
Compiler:          gcc version 3.3.3 (Debian 20040422)
OS:                Linux (i686) release 2.6.7-1-386

Hello,

I received an HTML message. Its encoding (Windows-1255) is specified in the HTML, and Konqueror opens it perfectly. However, when I view it in kmail, with the encoding set to "auto", I see only boxes instead of letters. If I set the encoding manually to windows-1255, it is displayed correctly.
I assume that it's because kmail adds at the beginning a box with the subject, sender and recipient.

I added a comment to bug 65862 about this, but I got no response, so I post this as a new bug.

Thanks,
Noam Raphael
Comment 1 Thiago Macieira 2004-12-13 20:32:28 UTC
Can you attach the whole email to this bug report?
Comment 2 Stephan Kulow 2005-01-04 11:02:07 UTC
reopen if you have the mail
Comment 3 Noam Raphael 2005-01-08 18:46:20 UTC
Created attachment 8991 [details]
A mail message with an HTML content that isn't decoded properly

I'm sorry I didn't add the attachment previously - I probably didn't get a
notification because of problems with my e-mail provider.
Comment 4 Noam Raphael 2005-01-08 18:47:46 UTC
It still does the same thing...
Comment 5 Thiago Macieira 2005-01-08 20:51:03 UTC
The message headers are completely invalid. They specify:

Content-Type: text/html
Content-Transfer-Encoding: Quoted-Printable

without saying what the charset is.

However, the message's HTML header section contains:
<meta http-equiv=3D"content-type" content=3D"text/html; charset=3Dwindows-12=
55"=3E
(which decodes to a valid HTML line)

So, the question becomes: should khtml inside KMail honour this?
Comment 6 Noam Raphael 2005-01-14 10:13:38 UTC
I would say, "Why not?"

This is not implicitly deciding on encoding. This is deciding on encoding based on where it's specified. And another thing: when I attached this html file, using kmail, the attachment headers were:

Content-Type: text/html;
  charset="us-ascii";
  name="htmlattach.html"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="htmlattach.html"

The charset is specified, but is wrong! It didn't cause any problems, because kmail didn't show the HTML content, but my point is: sometimes the mailing program doesn't know the encoding of the HTML it sends, and this is reasonable.

I think that generally, the khtml in kmail should first try to decide on the encoding based on the HTML itself, and if it fails, will use the encoding specified in the MIME header.

Can this be done?
Comment 7 Jens 2005-02-23 12:18:55 UTC
Hi,

I second this bug. This makes several HTML based newsletters display incorrectly, because they do not specify a charset in the MIME boundary, but they do specify one in the HTML code, using a META tag.

I would say, if there is one in the HTML code, override the one in the MIME header. Usually the application that *created* the HTML file knows better about what character sets it used, than the application used to *attach* the HTML file.


Thank you!

Jens
Comment 8 Tommy Sandström 2005-04-28 13:57:08 UTC
I have the same problem that is either related to this bug or a new bug. I receive HTML-mail that is multipart/alternative. Konqueror ignores any charset encodings at all. The charset is specified both in the multipart header and in the html body. For mails without multipart bodies it seems to work fine though. It seems that multipart headers are ignored. Below is a cut-down example of a mail with such body. I had to take the source from Mozilla-Thunderbird.since kmail did not show any headers at all for the multipart messages.

Content-Type: multipart/alternative; 
	boundary="----=_Part_1_33445663.1114686459510"

------=_Part_1_33445663.1114686459510
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
The plain text content....
------=_Part_1_33445663.1114686459510
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable


<?xml version=3D"1.0" encoding=3D"UTF-8"?>
<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml" xml:lang=3D"en" lang=3D"fi">
  <head>    =20
    <meta http-equiv=3D"Content-Language" content=3D"fi" />
    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8=
" />         =20
<--------- The HTML Content ------------->
------=_Part_1_33445663.1114686459510--
Comment 9 Tommy Sandström 2005-04-28 14:06:30 UTC
Jens,

It is possible to set charset in headers for each multipart body, but kmail ignores them as well, at least in version 1.7.2.

And correction to the confusement in the earlier comment by me. I was talking about kmail not konqueror :/

Tommy
Comment 10 Tommy Sandström 2005-04-28 14:08:43 UTC
*** This bug has been confirmed by popular vote. ***
Comment 11 Björn Ruberg 2009-12-20 16:25:43 UTC
Is this still valid?
Comment 12 Laurent Montel 2015-04-12 10:24:19 UTC
Thank you for taking the time to file a bug report.

KMail2 was released in 2011, and the entire code base went through significant changes. We are currently in the process of porting to Qt5 and KF5. It is unlikely that these bugs are still valid in KMail2.

We welcome you to try out KMail 2 with the KDE 4.14 release and give your feedback.