Bug 84702

Summary: feature: fallback functionality, if no character-set is set and some characters in the email cannot be displayed with default character-set
Product: [Applications] kmail Reporter: S. Burmeister <sven.burmeister>
Component: mimeAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED FIXED    
Severity: wishlist CC: gal
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:

Description S. Burmeister 2004-07-08 09:24:04 UTC
Version:            (using KDE KDE 3.2.3)
Installed from:    SuSE RPMs
OS:                Linux

Some email-clients do not add which character-set to use for the email. Hence in my case a lot of emails are displayed incorrectly, as I use utf-8 as default character-set. If there was a fallback feature, i.e. if some characters cannot be displayed with the default character-set and there is no character-set defined in the email-headers, fall back to another character-set, e.g. windows-... This would solve 99% of display errors due to not defined character-sets in the email headers.
Comment 1 S. Burmeister 2004-10-21 20:24:18 UTC
This is still valid for KDE 3.3.1.

Either people do not get all mail right that does have a charset defined, or they get those emails wrong that are sent without charset header.

Where is the problem of having an option to fall back on a certain charset only if no charset is defined? This way people can leave view -> encoding on AUTO as well as get all those windows mails right.
Comment 2 Till Adam 2004-10-21 23:06:07 UTC
Sven, we do fall back, to the system wide encoding configured by the user. I think that's the best we can do. After all many people won't get most of their mail in central-european windows 1250. I think the only sensible fallback is the system encodig, which is utf8 in the case of a suse. You can change that to something else, though, if you think that's more appropriate for your use case.
Comment 3 S. Burmeister 2004-10-21 23:16:05 UTC
How do you define best we can do?

The better of two solutions, at least from my point of view, is the one that solves more issues. As you can see in my case and in most others the current functionality does not solve more problems than a two level fall back functionality.

Encoding set to Auto -> if charset header exists: use charset header -> if user set "fall back charset" exists, use it -> if not fall back to charset set by the system.

This definitely solves more issues than the current way and is hence better. Although I am not too familiar with C++ I think that another if-statement, checking whether the user has set a "fall back charset" should not be too hard to implement.

Can you point me to the the place in the code where kmail checks which charset to use for the email?
Comment 4 Till Adam 2004-10-21 23:47:39 UTC
On Thursday 21 October 2004 23:16, S.Burmeister wrote:
> How do you define best we can do?

Best way to solve it we could come up with so far.

> The better of two solutions, at least from my point of view, is the one
> that solves more issues. As you can see in my case and in most others the
> current functionality does not solve more problems than a two level fall
> back functionality.

It was not clear to me that you were suggesting a two level fallback.

> Encoding set to Auto -> if charset header exists: use charset header -> if
> user set "fall back charset" exists, use it -> if not fall back to charset
> set by the system.

Currently we use the concept of an override codec, rather than a fallback 
codec. So unless the setting is auto, which means "use what the mail 
specifies", we override what the mail says with what the user explicitely 
selects. I wasn't around when that decision was made, so I can't tell you the 
exact reasoning for it. Maybe it makes more sense to change that to a 
fallback rather than an override.

> This definitely solves more issues than the current way and is hence
> better. Although I am not too familiar with C++ I think that another
> if-statement, checking whether the user has set a "fall back charset"
> should not be too hard to implement.
>
> Can you point me to the the place in the code where kmail checks which
> charset to use for the email?

The code in question is in kdepim/kmail/kmreaderwin.cpp. The codec is 
requested in several places, but the relevant bit for your case is probably 
around line 1670 and following.

Comment 5 Till Adam 2004-10-21 23:51:24 UTC
Reopening.
Comment 6 S. Burmeister 2004-10-21 23:54:46 UTC
Well, if it's an override and not simple if-clauses I might not have enough knowledge to do anything about it myself, yet I will have a look at the code.
Thanks for the pointer and re-opening the bug.
Comment 7 Till Adam 2004-10-24 14:01:16 UTC
On Thursday 21 October 2004 23:54, S.Burmeister wrote:

> ------- Well, if it's an override and not simple if-clauses I might not
> have enough knowledge to do anything about it myself, yet I will have a
> look at the code. Thanks for the pointer and re-opening the bug.

Sven, can you please send me an mbox of such a message to adam@kde.org (File 
-> save as), I have a patch for this I would like to test. It falls back to 
the first of the list of preferred charsets as specified in Configure KMail 
-> Composer -> Charsets. I think we can reasonably assume this to be the 
users preferred encoding.

Comment 8 S. Burmeister 2004-10-24 15:06:11 UTC
> Sven, can you please send me an mbox of such a message to adam kde org
> (File -> save as), I have a patch for this I would like to test. It falls

Sure I can.

> back to the first of the list of preferred charsets as specified in
> Configure KMail -> Composer -> Charsets. I think we can reasonably assume
> this to be the users preferred encoding.

But the user's favorite character encoding is not the favorite encoding of the 
author one gets the email from.
I like utf-8, thus utf-8 is first in my encoding list. Windows users hardly 
use utf-8, so the emails I get are not in the encoding I like, hence, if I 
understood you correctly, your patch would fail for me.

First I assumed that Windows always uses the same encoding, i.e. windows-1252, 
which I think is not the case?
However, the reason for me thinking this would be the case was, that everytime 
I got an email that did not display correctly, the reason lay in the lack of 
an encoding-header and all I had to do was to switch to windows-1252. Hence I 
thought it would be very convenient for people like me to be able to tell 
kmail that it should switch automatically to e.g. windows-1252 in case there 
is no encoding-header, yet obey the header if it exists.
This way I get all those windows emails right, plus all emails that have an 
encoding-header, which amounts to almost 100% of correctly displayed emails, 
at least in my case.

The setting could be placed in the misc. section, next to a drop-down menu 
containing all available encodings. Maybe even default to windows-1252 if 
that really includes 90% of emails with no encoding-header.

With your patch all people who want to use the functionality would have to 
write their emails in windows-1252 (being top of the list), which I do not 
think is a good solution?

Thanks for considering the feature request!

Comment 9 Ferdinand Gassauer 2004-11-28 16:44:25 UTC
CVS from today

if encoding is set to "auto" this mail is not rendered correctly, I have to set the encoding manually

Content-Type: text/plain; charset="iso-8859-1"

cu
ferdinand

BTW it seems to me that the encoding system has changed because also konqueror renders some characters not correctly - this might also be true because I switched form SuSE 9.1 to 9.2 and I run the CVS bild in english, whereas some files and directories have been created in german.
Comment 10 Ferdinand Gassauer 2004-11-28 17:48:50 UTC
Even kmails own sent messages are not displayed correctly if encoding is set to "auto"
Comment 11 Ferdinand Gassauer 2004-11-30 22:31:36 UTC
same problem for multipart messages

X-Security: message sanitized on resp
	See http://www.impsec.org/email-tools/sanitizer-intro.html
	for details. $Revision: 1.147 $Date: 2004-10-02 11:16:26-07 
Content-Type: multipart/alternative;
  boundary="----=_NextPart_000_28A5_01C4D6FE.156D0500"
X-Mailer: Microsoft CDO for Windows 2000
Thread-Index: AcTW9bOm7lsxUz11QDKqZCa643rRBQ==
Content-Class: urn:content-classes:message
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441
X-OriginalArrivalTime: 30 Nov 2004 16:00:24.0694 (UTC) FILETIME=[B3E90160:01C4D6F5]
X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on 
	resp.w.instantina.at
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,
	FORGED_RCVD_HELO,HTML_80_90,HTML_BADTAG_00_10,HTML_MESSAGE,
	MIME_QP_LONG_LINE,NO_REAL_NAME,URI_REDIRECTOR autolearn=no 
	version=3.0.0
X-UID: 19126
X-Length: 47466

This is a multi-part message in MIME format.

------=_NextPart_000_28A5_01C4D6FE.156D0500
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Comment 12 Till Adam 2005-01-08 15:59:10 UTC
CVS commit by tilladam: 

Make it possible to specify a fallback encoding in the config dialog under
Appearance -> Message Window which is used if there is no override codec 
active (Encoding set to Auto) and the mail itself does not specify one
either. Since it is impossible to find a default for this which works
everywhere the settings defaults to locale, unless locale is eucjp in
which case the jis7 encoding is used. So if you are a SuSE 9.2 user, for
example this will default to utf8 but you can then set it to the encoding
the mails you get are usually in. iso-8859-15, for example, for germany.

Sven, thanks for reporting and insisting on this solution. It's probably
the right thing to do (TM).

BUG: 45279
BUG: 84702
BUG: 79639


  M +75 -1     configuredialog.cpp   1.506
  M +19 -0     configuredialog_p.h   1.104
  M +7 -0      kmail.kcfg   1.31
  M +5 -0      kmmessage.cpp   1.506



Comment 13 Ingo Klöcker 2005-05-16 15:11:59 UTC
*** Bug 79254 has been marked as a duplicate of this bug. ***