Bug 306401

Summary: Problem receiving ICQ messages containing non-ASCII letters
Product: [Frameworks and Libraries] telepathy Reporter: Alex Richardson <arichardson.kde>
Component: text-uiAssignee: Telepathy Bugs <kde-telepathy-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: asturm, mklapetek, roland.leissa
Priority: NOR    
Version: git-latest   
Target Milestone: Future   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In: 0.7.0

Description Alex Richardson 2012-09-07 14:48:33 UTC
Whenever I receive an ICQ message containing a german umlaut or ß these letters get replaced with question marks and "(There was an error receiving this message.  Either you and xxxxxxxxx have different encodings selected, or xxxxxxxxx has a buggy client.)" gets appended to the message.

The problem is probably that the encoding that message is sent in is default Windows cp, but there is no way to change the encoding my messages are send in/the encoding received messages should be.
The KCM only offers UTF-8 as a choice of encoding.

Ideally this should be selectable per contact.

Reproducible: Always

Steps to Reproduce:
1. Receive message from Windows ICQ user containing umlauts
Actual Results:  
Umlauts are replaced with ?

Expected Results:  
Receive the correct letters
Comment 1 Alex Richardson 2012-10-16 18:05:01 UTC
I looked into the source of libpurple and found this:

else if (charset == AIM_CHARSET_LATIN_1) {
	if ((sourcebn != NULL) && oscar_util_valid_name_icq(sourcebn))
		charsetstr1 = purple_account_get_string(account, "encoding", OSCAR_DEFAULT_CUSTOM_ENCODING);
	else
		charsetstr1 = "ISO-8859-1";
	charsetstr2 = "UTF-8";
}

I.e. the message is a Latin-1 message (set in the protocol header). But it never tries ISO-8859-1 since it uses our custom encoding instead (which is UTF-8).
Seems like a bug in libpurple to me, it uses UTF-8 as a fallback encoding for a Latin-1 message and otherwise just the user encoding.

I am not sure what sourcebn is, I didn't dig that deep into their code, but at least on my system the first branch (with user encoding) is always taken.
I think it should always have ISO-8859-1 since this is the expected encoding (and is sent by the official ICQ client) at least as a fallback instead of UTF-8.
 
This can be worked around by using setting "ISO-8859-1" as the encoding in the accounts KCM, but I will also report a bug to libpurple.
Comment 2 Martin Klapetek 2012-10-17 06:31:46 UTC
Thanks for digging up into that! Good job :)

Can the official client have different encoding set? Like UTF-8? Because as much as I'd like to change our default encoding to Latin1, I'm afraid this might get broken for other users with different clients who use characters not present in Latin1 (which is eastern europe, so does not cover for example central europe's characters, not mentioning some unicode stuff). UTF-8 should always be the safest bet.

Also it's almost unbelieveable that in 2012 the official ICQ client (is there still such thing?) is still not using UTF-8, but rather a limited Latin1 by default (that's why I personally dislike that protocol very much).

Can you link the libpurple bug report so we can track it? Thanks!
Comment 3 Alex Richardson 2012-10-17 07:41:52 UTC
The ICQ messages have a field in the header which specifies the encoding.
The official client uses Latin1 if it can (probably to use less bytes), but once you send characters which are not part of Latin1 it chooses UTF16-BE and sets the flag in the header (verified with wireshark). 

You are right, it may be different in e.g. Russia, will have to see whether I can check this.

Trying UTF-8 first should mostly not be a problem, since at least for the ASCII characters it is the same as ISO-9959-1, and once the high bit is set it will probably fail. Then it could fall back to ISO-9959-1. However as you can see in the source snippet above, the two last lines should be swapped for that to work. Therefore I think we should add ISO-8859-1 to the encoding combobox in the accounts KCM and have that as a default to work around this issue in libpurple. If someone wants/needs to they can still set it back it to UTF-8.
Comment 4 Andreas Sturmlechner 2012-11-25 18:52:26 UTC
Yeah, same here while pidgin is fine.

It would be nice to at least get that annoying error message out of the message - maybe hidden in some kind of warning/error icon or status bar.
Comment 5 Martin Klapetek 2013-04-16 14:33:56 UTC
*** Bug 318448 has been marked as a duplicate of this bug. ***
Comment 6 Daniel Vrátil 2013-04-17 10:48:30 UTC
Git commit c18f4523fb7fdbdabc6229132362e10ac22f2140 by Dan Vrátil.
Committed on 17/04/2013 at 12:47.
Pushed by dvratil into branch 'master'.

Add configuration for additional charsets in ICQ
FIXED-IN: 0.7.0
REVIEW: 110060

M  +82   -4    plugins/haze/icq-server-settings-widget.cpp
M  +3    -0    plugins/haze/icq-server-settings-widget.h
M  +0    -5    plugins/haze/icq-server-settings-widget.ui

http://commits.kde.org/telepathy-accounts-kcm/c18f4523fb7fdbdabc6229132362e10ac22f2140