Bug 121053

Summary: Outgoing messages sent as UTF-8, ignoring manually selected encoding
Product: [Unmaintained] kopete Reporter: Bartosz Fabianowski <freebsd>
Component: ICQ and AIM PluginsAssignee: Kopete Developers <kopete-bugs-null>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: FreeBSD Ports   
OS: FreeBSD   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Patch which forces ICQ plugin to send channel 1 messages only.

Description Bartosz Fabianowski 2006-01-30 18:51:50 UTC
Version:           0.11.51 (using KDE KDE 3.5.0)
Installed from:    FreeBSD Ports
Compiler:          gcc version 3.4.4 [FreeBSD] 20050518 
OS:                FreeBSD

The encoding of outgoing ICQ messages recently broke in the dev-0.12 branch. SVN commit 497530, which was supposed to fix encoding related problems in OSCAR, seems to be at fault.

Since that commit, I have been forced to manually select an encoding for each ICQ contact. Otherwise, German umlauts get mingled and don't display correctly. However, the manually selected encoding seems to be applied to *incoming* messages only. *outgoing* messages are always sent as UTF-8, regardless of what encoding has been manually choses. Worse yet, the messages seem not to be marked as UTF-8 so that the client on the other side cannot decode them.

A particular example of this behavior is a contact of mine using Trillian Pro 3.1. His client advertises UTF-8 capability, but all I receive from it are encoded in ISO 8859-1. I have therefore manually set the encoding for this contact to 8859-1. Outgoing messages are definitely still sent as UTF-8 and after seeing what they look like on his screen, I believe that his client is interpeting them as ISO 8859-1; this probably is due to the lack of some flag indicating that this is an UTF-8 message.

So, there are two problems here:
1. Outgoing messages are sent as UTF-8 regardless of manually selected encoding
2. They apparently are not marked as UTF-8 messages, which leads to misinterpretation at the receiving side

If I should split those two issues into separate bugs, please let me know.

Before this bug was filed, the issue had been discussed briefly in 79574. Also, bug 92740 is similar; but it's for the reverese case, where a manually selected encoding is applied to *outgoing* messages *only*. The issue at hand seems to be genuinely new.
Comment 1 Bartosz Fabianowski 2006-02-01 02:21:25 UTC
May I ask why this was assigned to the component "History Plugin"? This happens while sending a message, not when storing it in the history.
Comment 2 Matt Rogers 2006-02-01 02:26:01 UTC
because i have poor mouse skills? :)
Comment 3 Oleg Girko 2006-02-01 03:21:08 UTC
Created attachment 14472 [details]
Patch which forces ICQ plugin to send channel 1 messages only.

Trillian advertises both CAP_UTF8 and CAP_ICQSERVERRELAY capabilities, hence
Kopete sends channel 2 UTF-8 messages correctly, indicating that they are
Unicode messages using GUID after colour codes, but Trillian does not recognise
such GUID in channel 2 message and interprets message as non-Unicode.
Other clients (ICQ5, Miranda) recognise channel 2 UTF-8 messages correctly, so
this problem occurs with Trillian only.
This patch provides workaround for Trillian bug. It makes Kopete never send
channel 2 messages, using channel 1 for both Unicode and non-Unicode messages.
The disadvantage of sending channel 1 messages only is the way Unicode is
encoded. The Unicode representation in channel 1 messages is UCS2, which is
16-bit, so Unicode characters with codes greater than 65535 can not be sent.
People from Eastern countries would be unhappy with this. Channel 2 messages
use UTF-8, so they has no such disadvantage.
Comment 4 Bartosz Fabianowski 2006-02-01 03:28:27 UTC
Since this is a Trillian bug, would it be possible to force channel 1 only when the client on the other side is recognized as Trillian? Kopete seems to be able tell quite a few different clients apart; maybe it can recognize Trillan as well?
Comment 5 Chani 2006-02-01 03:33:31 UTC
and is someone going to tell the trillian developers to fix *their* code? :)
Comment 6 Thiago Macieira 2006-02-02 20:27:45 UTC
Codepoints in Unicode > U+FFFF are not used for normal, living languages. Unless you want to communicate an old script or using some dead language, you won't see this happen.

Besides, QString cannot represent them anyways, so it's pointless to argue this point.
Comment 7 Matt Rogers 2006-02-26 17:50:12 UTC
SVN commit 513828 by mattr:

workaround the trillian bug where they fail to correctly recognize type 2 
messages with UTF-8.

BUG: 121053



 M  +4 -23     icqcontact.cpp  


--- branches/kopete/0.12/kopete/protocols/oscar/icq/icqcontact.cpp #513827:513828
@@ -402,32 +402,13 @@
 
 	QTextCodec* codec = contactCodec();
 
-	int messageChannel;
+	int messageChannel = 0x01;
 	Oscar::Message::Encoding messageEncoding;
 
-	if ( !isOnline() )
-	{
-		messageChannel = 0x01;
-		messageEncoding = Oscar::Message::UserDefined;
-	}
-	else if ( !m_details.hasCap( CAP_UTF8 ) )
-	{
-		if ( m_details.hasCap( CAP_ICQSERVERRELAY ) )
-			messageChannel = 0x02;
-		else
-			messageChannel = 0x01;
-		messageEncoding = Oscar::Message::UserDefined;
-	}
-	else if ( m_details.hasCap( CAP_ICQSERVERRELAY ) )
-	{
-		messageChannel = 0x02;
-		messageEncoding = Oscar::Message::UTF8;
-	}
-	else
-	{
-		messageChannel = 0x01;
+	if ( isOnline() && m_details.hasCap( CAP_UTF8 ) )
 		messageEncoding = Oscar::Message::UCS2;
-	}
+	else
+		messageEncoding = Oscar::Message::UserDefined;
 
 	QString msgText( msg.plainBody() );
 	// TODO: More intelligent handling of message length.
Comment 8 mi+kde 2007-01-08 16:15:23 UTC
Hello!

With Kopete-0.11.1 I had problems communicating in Ukrainian with a Trillian-using friend (using AIM). He could see my Cyrillic characters, but I could not see his (they were shown to me as garbage -- NOT as question marks). He even upgraded to the latest Trillian-3.1 but nothing changed.

Then I upgraded to Kopete-0.12.3 and the problem we had is still here -- his Cyrillics still show up as garbage on my side.

In ADDITION, the Cyrillics I type, which were correctly displayed on his side, when I was using 0.11.1, are now shown as question marks over there (suggesting improper UTF-8 decoding somewhere)...
Comment 9 mi+kde 2007-01-09 20:55:37 UTC
As a matter of fact, AIM-communications with an earlier version of Kopete itself is broken too...

My other Cyrillic-using friend still has kdenetwork-3.3.0 installed. Talking to her in Ukrainian was not a problem before my upgrade to Kopete-0.12.3. Now I still see her Cyrillics fine, but she sees only question marks.