Bug 73901

Summary: long nicknames are cut off
Product: [Unmaintained] kopete Reporter: Favonia <h3226699>
Component: MSN PluginAssignee: Kopete Developers <kopete-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: acelan, assemhassan, mrnan216, ogoffart, sromero
Priority: NOR    
Version: 0.7.3   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Favonia 2004-01-31 17:52:42 UTC
Version:           0.7.3 (using KDE KDE 3.1.5)
Installed from:    Debian testing/unstable Packages
OS:          Linux

Loaded plugins:
MSN Messenger
Connection Status
Contact Notes
History

I have two accounts in MSN Passport. First I logged in one of them in official MSN (WinXP /_\) and the other one in Kopete. Let both accounts appear in each other's contact list, and change the nickname of the one in MSN with long nickname. It looks nice on MSN, but when I went back to use Kopete, I found that the long nickname has been cut off.

there's a sample:
這是一個超長的暱稱這是一個超長的暱稱這是一個超長的暱稱這是一個超長的暱稱這是一個超長的暱稱
(LANG: zh_TW, traditional Chinese :P)

and I caught something in the standard error/output:

-----START----- (it's not the full output..)

Entity: line 6: parser error : Input is not proper UTF-8, indicate encoding !
暱稱這是一個超長的暱稱這是一個超長的暱稱這是一個超長
                                                                               ^
Entity: line 6: error: Bytes: 0xE7 0x9A 0x25 0x38
暱稱這是一個超長的暱稱這是一個超長的暱稱這是一個超長
                                                                               ^
Entity: line 3: parser error : Input is not proper UTF-8, indicate encoding !
暱稱這是一個超長的暱稱這是一個超長的暱稱這是一個超長
                                                                               ^
Entity: line 3: error: Bytes: 0xE7 0x9A 0x25 0x38
暱稱這是一個超長的暱稱這是一個超長的暱稱這是一個超長
                                                                               ^
kdecore (KAction): WARNING: KActionCollection::KActionCollection( QObject *parent, const char *name, KInstance *instance )

------END------

IMHO, it seems to be a buffer overflow problem...
sorry that my english sucks but i really wanna report this bug
Comment 1 Martijn Klingens 2004-01-31 18:02:46 UTC
Subject: Re: [Kopete-devel]  New: long nicknames are cut off

On Saturday 31 January 2004 17:52, Peter Lan wrote:
> Entity: line 6: parser error : Input is not proper UTF-8, indicate encoding
> ! 暱稱這是一個超長的暱稱這是一個超長的暱稱這是一個超長

and

> IMHO, it seems to be a buffer overflow problem...

No, it's an encoding problem.

What I don't understand is that it's in traditional Chinese encoding. I'm 
pretty sure the MSN protocol uses UTF-8 (Unicode) since quite some time, so 
why would your name not be in UTF-8?

Does the name look ok in the contact list?

Comment 2 Favonia 2004-01-31 19:01:21 UTC
Well, it's cut off too (in the contact list and in the title of the chat box)

Maybe you could take a look at this screenshot
http://www.csie.ntu.edu.tw/~k92201008/temp/screenshot/longnickname1.png
You could see that the end of nickname looks strange

The MSN protocol indeed uses UTF-8, and I just want to explain which language I use in the nickname. (sorry)

I make another sample
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWX

and get something similar:

----START----
Entity: line 6: parser error : Input is not proper UTF-8, indicate encoding !
QRSTUVWXYZABCDEFGHIJKLMNOP
...............................................................................^
Entity: line 6: error: Bytes: 0xEF 0xBC 0x25 0x42
QRSTUVWXYZABCDEFGHIJKLMNOP
...............................................................................^
---- END ----

'.' stands for a space (\x20)
I found that the space char was missing last time and the position of '^' was not correct.

Here's a screenshot for it:
http://www.csie.ntu.edu.tw/~k92201008/temp/screenshot/longnickname2.png

Thanks for your helping :)
Comment 3 Martijn Klingens 2004-01-31 19:13:13 UTC
Subject: Re: [Kopete-devel]  long nicknames are cut off

On Saturday 31 January 2004 19:01, Favonia wrote:
> I make another sample
> ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLM

What charset is this? KMail displays it perfectly fine as western letters, but 
in fact they are *NOT* the same Ascii codes. Even though I see 'A B C D 
E ...'. Are there two different utf-8 sequences to represent the western 
alphabet???

When I do 'view source' your text reads as

----START----
Entity: line 6: parser error : Input is not proper UTF-8, indicate encoding !
QRSTUVWXYZABCDEFGHIJKLMNOP
...............................................................................^
Entity: line 6: error: Bytes: 0xEF 0xBC 0x25 0x42
QRSTUVWXYZABCDEFGHIJKLMNOP
...............................................................................^
---- END ----

Which makes it extra strange that it works on the debug output, since the 
Konsole is usually not utf-8.

Weird stuff :)

Comment 4 Favonia 2004-02-01 05:02:32 UTC
They're called "fullwidth latin letters" in Unicode, and somehow wider than the origin ones. (for decoration?) I made a redirection, somthing like "kopete > output 2>&1" and force the encoding to be UTF-8, so there's no problem. :)

I found another symptom about this bug, the screenshot is right here:
http://www.csie.ntu.edu.tw/~k92201008/temp/screenshot/longnickname3.png

The chatview window doesn't show anything!
But if I make the nickname much shorter, it works fine.

http://www.csie.ntu.edu.tw/~k92201008/temp/screenshot/longnickname4.png
(new nickname: ABCDEFGHIJKLMNOPQRSTUVWXYZ)

It seems that the UTF-8 encoding is right as rain.

Thanks for helping. :)
Comment 5 Martijn Klingens 2004-02-01 11:56:56 UTC
Subject: Re: [Kopete-devel]  long nicknames are cut off

On Sunday 01 February 2004 05:02, Favonia wrote:
> I found another symptom about this bug, the screenshot is right here:
> http://www.csie.ntu.edu.tw/~k92201008/temp/screenshot/longnickname3.png
>
> The chatview window doesn't show anything!
> But if I make the nickname much shorter, it works fine.

Hmm, it's almost like it somehow chokes on long names.

Very weird. Does this only happen with MSN? Or is it a problem in Kopete's 
core, used by all protocols?

Comment 6 Favonia 2004-02-03 03:52:54 UTC
I've tried ICQ pro and Yahoo! messager.

ICQ has some constraints that I can't reach the limit, and it has some encoding problem if I use non-ASCII characters.

Yahoo! messager's default nickname appears to be (always) the ID, and the user can only change his/her friends' name in the contact list locally.

So there's no answer to it. :(
Sorry but I don't know how to test other messengers...

(well, how about a `fake_messenger` plugin for testing :P)
Comment 7 jstuart 2004-02-13 11:11:14 UTC
*** Bug 74920 has been marked as a duplicate of this bug. ***
Comment 8 jstuart 2004-02-13 11:12:03 UTC
*** Bug 75050 has been marked as a duplicate of this bug. ***
Comment 9 Olivier Goffart 2004-02-24 22:29:25 UTC
I think i knoàw what's hapenning.

To transmit the message to the server, we need to url encode it. we use a kde method that encode all non-ascii char special char (like space or %)
But the MSN official client does not escape as many char as Kopete does.  

anyway, if we send a too long command to the server we are disconnected.

so if someone has  "éééééééééééééééééééé" as nickname, kopete will reply back to the server  %e7%e7%e7%e7..............  this is more long.  To not been disconnected, the command is snipped.

So that's explaining all.


The solution:  code our own URL encoder that only encode char that need to be encoded.

Comment 10 AceLan Kao 2004-03-03 03:31:38 UTC
Its a big problem to me, so I modify the code to cut the long nick, and it works for me.
Hoping there is a good solution for it, before that, I'll use this method to make the kopete works well.

#code from cvs KDE_3_2_BRANCH 2004/03/02
#497 protocols/msn/msnsocket.cpp
QString MSNSocket::escape( const QString &str )
{
        return ( KURL::encode_string( str.left( 50), 106 ) );
}
Comment 11 Olivier Goffart 2004-03-03 18:18:58 UTC
*** Bug 76671 has been marked as a duplicate of this bug. ***
Comment 12 AceLan Kao 2004-05-19 16:48:16 UTC
Hello here, I make a patch for this bug and I put it at http://kde.linux.org.tw/~acelan/kopete/msnsocket.cpp.patch
Olivier Goffart said that "The solution:  code our own URL encoder that only encode char that need to be encoded". So I copy the encode function from Qt, and cut it to fit our need. I've test the patch for a few days and it works fine for me.  The patch source code is come from the cvs i just update. And the patch is a little ugly, I'm not sure where to put the two new function, so i declare them as static. I need somebody help me to modify them to the correct position. I really hope this patch could be accept by the kopete team, because this bug is a very big problem for the Chinese user.
Comment 13 AceLan Kao 2004-06-06 09:53:15 UTC
hello, I had made a patch for this bug, and now I modified it to make it more clear. Would you please help me to commit this patch. This bug is a big issue for the chinese user, and I think this patch should work well in other language environment. Thank you very much.

http://kde.linux.org.tw/~acelan/kopete/msnsocket.cpp.patch
Comment 14 Olivier Goffart 2004-06-06 10:47:19 UTC
Thanks for the patch, i'll test it and commit it if it works.

Anyway, you still escape many character.  i think only the space and the % is enough, i'll try that.
Comment 15 AceLan Kao 2004-06-07 10:57:11 UTC
Hi, I have tested the escape characters one by one, let me explain them to you.
'<' '>' : If you don't escape them, you can't login to the msn server.
' ' : If you don't escape the space, you'll be disconnected if someone's nick contain the space.
'\\' : The code of some chinese words are end with '\', if you don't escape it, some chinese nick will become strange symbols.
'^' '&' '*' : These three characters are need to be escape, if the nickname contains them, these characters will disappear. 

That's why I escape these characters, and you can test them to see if my test is correct or not. 
Thank you very much. ^^
Comment 16 Olivier Goffart 2004-06-07 11:27:23 UTC
i'm actualy using your patch but i only escape ' '  and %

i'm actually logged and all seems to be fine.
the official client only escapes % and space.

oh, i think \t should be escaped too.

Can you confirm that you are forced to escape all theses symbols ?
Comment 17 Olivier Goffart 2004-06-07 11:47:02 UTC
In fact, if you can't connect, it's certenly because your password contains one of theses symbols.
I also use MSNSocket::escape() to escape the passwors which is send in a URL.
so there we will have to use the KURL::encode_string


And i will encode all chars with the value <= 32
Comment 18 Olivier Goffart 2004-06-09 16:49:10 UTC
CVS commit by ogoffart: 

Better escaping of nickname

CCMAIL: 73901-done@bugs.kde.org

thanks to Chia-Lin Kao 


  M +2 -2      msnnotifysocket.cpp   1.139
  M +40 -6     msnsocket.cpp   1.87


--- kdenetwork/kopete/protocols/msn/msnnotifysocket.cpp  #1.138:1.139
@@ -591,5 +591,5 @@ void MSNNotifySocket::slotAuthJobDone ( 
                 QString authURL = "https://" + m_sid + "/ppsecure/post.srf?lc=" + rx.cap( 1 ) + "&id=" +
                         rx.cap( 2 ) + "&tw=" + rx.cap( 3 ) + "&cbid=" + rx.cap( 2 ) + "&da=passport.com&login=" +
-                        m_account->accountId() + "&domain=passport.com&passwd=";
+                        KURL::encode_string( m_account->accountId()) + "&domain=passport.com&passwd=";
 
                 kdDebug( 14140 ) << "MSNNotifySocket::slotAuthJobDone: " << authURL << "(*******)" << endl;
@@ -599,5 +599,5 @@ void MSNNotifySocket::slotAuthJobDone ( 
                 if(m_kv.isNull()) m_kv="";
 
-                authURL += escape( m_password );
+                authURL += KURL::encode_string( m_password ) ;
                 job = KIO::get( KURL( authURL ), false, false );
                 job->addMetaData("cookies", "manual");

--- kdenetwork/kopete/protocols/msn/msnsocket.cpp  #1.86:1.87
@@ -494,5 +493,40 @@ void MSNSocket::slotReadyWrite()
 QString MSNSocket::escape( const QString &str )
 {
-        return ( KURL::encode_string( str, 106 ) );
+        //return ( KURL::encode_string( str, 106 ) );
+        //It's not needed to encode everything. The official msn client only encode spaces and %
+        //If we encode more, the size can be longer than excepted.
+
+        int old_length= str.length();
+        QChar *new_segment = new QChar[ old_length * 3 + 1 ];
+        int new_length = 0;
+
+        for     ( int i = 0; i < old_length; i++ )
+        {
+                unsigned char character = str[i];
+                
+                 /*character == ' '  || character == '%' || character == '\t' 
+                        || characters == '\n' || character == '\r'*/
+                /*      || character == '<' || character == '>' ||  character == '\\'
+                        || character == '^' || character == '&' || character == '*'*/ 
+                
+                if( character <= 32 || character == '%' )
+                {
+                        new_segment[ new_length++ ] = '%';
+
+                        unsigned int c = character / 16;
+                        c += (c > 9) ? ('A' - 10) : '0';
+                        new_segment[ new_length++ ] = c;
+
+                        c = character % 16;
+                        c += (c > 9) ? ('A' - 10) : '0';
+                        new_segment[ new_length++ ] = c;
+                }
+                else
+                        new_segment[ new_length++ ] = str[i];
+        }
+
+        QString result = QString(new_segment, new_length);
+        delete [] new_segment;
+        return result;
 }
 


Comment 19 Olivier Goffart 2004-06-09 22:01:04 UTC
*** Bug 81461 has been marked as a duplicate of this bug. ***
Comment 20 AceLan Kao 2004-08-16 15:35:58 UTC
I know this bug have been marked as resolved, but there is still one thing need to correct. The chinese word is two bytes, and the ascii code number may below 32 or may above 32. If the first byte of the chinese word is below 32 and the second byte is above 32, this esacpe function encode only first byte of the word and that will mix up the chinese word. Please accept this patch to correct this bug. Thank you very much.

--- kopete/protocols/msn/msnsocket.cpp.org      2004-08-16 21:39:33.000000000 +0800
+++ kopete/protocols/msn/msnsocket.cpp  2004-08-16 21:39:38.000000000 +0800
@@ -497,7 +497,7 @@
                /*      || character == '<' || character == '>' ||  character == '\\'
                        || character == '^' || character == '&' || character =='*'*/

-               if( character <= 32 || character == '%' )
+               if( character == 32 || character == '%' )
                {
                        new_segment[ new_length++ ] = '%';
Comment 21 Olivier Goffart 2004-08-16 19:08:09 UTC
But the patch is not correct.
What about every other char than 32 which should be escaped ?

i don't remember exactly the code, but I think I used char* and probably QChar should be used.

Anyway, i have no time now to have a look because my examens.
Comment 22 AceLan Kao 2004-08-17 07:28:45 UTC
We really need to correct this or the chinese word would be mixed up. And you know that next release: 0.9.0 due out 2004.08.18, there is no time to make more check. I just can tell you about my experience, I use the patch for 2 or 3 weeks and it work fine for me. If 0.9 doesn't contain that patch, I might be blamed to die. And I believe only escape space and '%' is enough. Please help us.
Comment 23 Olivier Goffart 2004-08-17 09:02:46 UTC
can you simply try to replace the char with a QChar (QChar is utf8, unlike char)

the problem is that char like \n \r or msn plus color code need also to be escaped.
Comment 24 Olivier Goffart 2004-08-27 21:21:09 UTC
CVS commit by ogoffart: 

fix bug 73901

i'll backport
CCMAIL: 73901-done@bugs.kde.org


  M +1 -6      msnsocket.cpp   1.97


--- kdenetwork/kopete/protocols/msn/msnsocket.cpp  #1.96:1.97
@@ -491,10 +491,5 @@ QString MSNSocket::escape( const QString
         for     ( int i = 0; i < old_length; i++ )
         {
-                unsigned char character = str[i];
-
-                 /*character == ' '  || character == '%' || character == '\t'
-                        || characters == '\n' || character == '\r'*/
-                /*      || character == '<' || character == '>' ||  character == '\\'
-                        || character == '^' || character == '&' || character == '*'*/
+                unsigned short character = str[i].unicode();
 
                 if( character <= 32 || character == '%' )


Comment 25 Matt Rogers 2004-08-27 21:47:42 UTC
this should be in KDE 3.3.1
Comment 26 Olivier Goffart 2004-08-27 22:02:55 UTC
Le Vendredi 27 Août 2004 21:47, Matt Rogers a écrit :
> ------- this should be in KDE 3.3.1

I already backported it :-)