Bug 79574 - [icq] encoding translation not applied to the contact's information
Summary: [icq] encoding translation not applied to the contact's information
Status: RESOLVED FIXED
Alias: None
Product: kopete
Classification: Unmaintained
Component: ICQ and AIM Plugins (other bugs)
Version First Reported In: 0.8.1
Platform: unspecified FreeBSD
: NOR normal
Target Milestone: ---
Assignee: Kopete Developers
URL:
Keywords:
: 73715 95927 116808 120349 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-04-13 18:30 UTC by mi+kde
Modified: 2006-02-02 22:03 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mi+kde 2004-04-13 18:30:30 UTC
Version:           0.8.1 (using KDE 3.2.1, compiled sources)
Compiler:          gcc version 3.3.3 [FreeBSD] 20031106
OS:          FreeBSD (i386) release 5.2-CURRENT

When I request the contact's information in the ICQ mode and it happens to be in an incompatible encoding, I see gibberish. I tried to specify the contact's encoding explicitly (Win-1251) and refetch, but that did not help.

My encoding is KOI8-U, which can represent all of the Win-1251 characters. Why is the translation not applied to the contact info?

Thanks!
Comment 1 Matt Rogers 2004-04-19 05:59:32 UTC
I think I know what's causing this.
Comment 2 Matt Rogers 2004-04-19 06:13:58 UTC
on second thought, no, I don't
Comment 3 Stefan Gehn 2004-06-03 22:39:26 UTC
until I know how to delay text decoding till data arrives @ oscaraccount I cannot do much about this. Specifically: can I store utf-8, utf-16 in a QCString?
If yes then I can redesign the whole procedure so userinfo gets decoded when it reaches the contact object (so far we do it right after receiving the packet, we do not know which contact the packet refers to though).
Comment 4 Stefan Gehn 2004-06-03 22:40:57 UTC
*** Bug 73715 has been marked as a duplicate of this bug. ***
Comment 5 mi+kde 2004-06-03 23:07:52 UTC
Hi, Stefan. I'm sorry, but I can not understand your comment.

If the other user's information can be displayed _at all_, it can also be interpreted in the correct encoding.

If the server does not specify, which encoding the other user used to enter his information, the software should give me the chance to chose.

If the encoding is specified by the server (or is known in advance to be UTF), than there is no problem at all, is there?

Thanks. Yours,

	-mi
Comment 6 Matt Rogers 2004-12-28 15:33:00 UTC
*** Bug 95927 has been marked as a duplicate of this bug. ***
Comment 7 Michal Čihař 2005-01-23 22:14:20 UTC
I can select encoding at user details page, so that encoding could be used for it. Or am I missing something?
Comment 8 Matt Rogers 2005-01-26 03:46:47 UTC
that encoding only applied to messages from the user.
Comment 9 Kirill Belokurov 2005-01-27 15:53:05 UTC
I have made some tests and found, that (for russian users) the user information usually is sent in CP1251 encoding. The reason is that CP1251 is russian encoding of Windows (and ICQ). The corresponding code in Kopete takes this code as latin-encoded (as I can see from CVS). No conversion is done.

Probably the best way will be to allow to specify the encoding-1 for profile (and to store it somewhere), and encoding-2 for the decoding messages from users. 

What is the current state of the encoding-specific code? Is it completely absent or just isn't merged to the main CVS (latest CVS-build of Kopete doesn't have even a widget to select encoding in user ICQ information).

Is this comments thread the best way for discussing this item?

Best regards.
Comment 10 Matt Rogers 2005-02-02 15:08:48 UTC
currently, other than for messages, we just assume latin-1 encoding for things like user info. I haven't had the chance to implement better support for encodings in user info yet.
Comment 11 Michal Čihař 2005-02-12 01:21:25 UTC
It's definitely NOT latin-1, if it would be, I couldn't get chinesse (or something like that) symbol in user name :-)

I got (let's see how will bugzilla handle it):
D釟a
Comment 12 Jan Ritzerfeld 2005-03-10 23:29:22 UTC
I've had this problem with these (often chinese) symbols instead of German umlauts both in ICQ messages and user infos using kopete 0.9.x till I compiled today's cvs.
There are still some encoding issues with "using alias from server" but I don't know wether this is caused by my old 0.9.x contactlist.xml with wrongly encoded values. I'll correct this file manually and report if problems persist.
Comment 13 Jan Ritzerfeld 2005-03-13 20:50:57 UTC
ICQ plugin-data-field key="ssi_alias" is reverted to wrong encoding after correcting it manually in contactlist.xml. But I don't see this key really used anywhere?
Comment 14 Matt Rogers 2005-04-07 01:50:41 UTC
the ssi_alias key does get used. See ICQProtocol::deserializeContact
Comment 15 Jan Ritzerfeld 2005-04-07 17:13:26 UTC
IMHO ssi_alias isn't deserialized in ICQProtocol::deserializeContact. Oscar::SSI::checkTLVs gets the alias and OscarContact::serialize "saves" it.
I've got a contact's alias stored in Utf8 on server:
kopete (oscar - raw protocol): [Oscar::TLV Buffer::getTLV()] TLV data is [ffffffffffffffc2 ffffffffffffffb0 ffffffffffffffc2 ffffffffffffffba 20 xx xx xx  20 ffffffffffffffc2 ffffffffffffffba ffffffffffffffc2 ffffffffffffffb0 ]
...
kopete (oscar - raw protocol): [void Oscar::SSI::checkTLVs()] Got an alias '°º  xxx º°' for contact 'xxxxxxx'

These Âs shouldn't be there.
Comment 16 Shahar 2005-04-24 13:20:56 UTC
I'm having the same issues with Hebrew encodings. I'm not sure if it's any help, but SIM seems to do it all just fine. Maybe you can have a look at their code?
Comment 17 Pavel Shirov 2005-09-09 17:54:46 UTC
Anybody going to fix this? This is still in place with 0.10.3, icq, russian.
Comment 18 Michael Stather 2005-09-10 18:06:22 UTC
The same for me with german umlauts in usernames
GAIM displays them fine, but I´d like to use Kopete
please fix this for the next version!!!!
Comment 19 Matt Rogers 2005-11-21 14:28:40 UTC
*** Bug 116808 has been marked as a duplicate of this bug. ***
Comment 20 Dmitry Suzdalev 2005-11-30 22:48:15 UTC
I have this bug!
And I don't want to have it :).
Adding my votes ;).
Thanks in advance for fixing it!
Comment 21 Dmitry Suzdalev 2005-12-01 23:41:24 UTC
Btw, this bug has more serious consequensies.
Yesterday, I opened kopete (included in KDE-3.5.0) to try it out again.
Message encoding is fixed, yes (and thanks!).
But...
I had the group of contacts with name encoded in Windows-CP1251 (russian ICQ encoding).
Kopete displayed it as gibberish. I tried to rename it...
Group name became readable in Russian, but it seems that Kopete had re-written UTF-encoded groupname to icq server, and server interpreted it as string in CP1251 encoding. Or smth like that.
Result... 
After relaunching kopete all my contacts which were in that group were deleted from server (launched sim to check this), that group was displayed as "???????" and kopete asked me about bunch of accounts that are missing on server, but are saved in local copy of contactlist (if I understood correctly) :).

Now about the code.
Today I had time to view related sim and kopete code.
What I found out was that sim stores user info in ICQUserInfo structure (it's filled by server). And there's field "Encoding" there.
Corresponding SIM class ICQContact (IIRC - not sure about exact name, I'm writing from home) has functions toUnicode(str) and fromUnicode(str), which convert str to/from Unicode based on actual encoding of contact reported by server.
I didn't find the code which uses this field in kopete (kopete has several ICQ*Info classes - i searhed in all of them). 
kopete's liboscar ignores the fact that encoding of UserInfo strings may differ from contact to contact while sending/receiving messages using ICQ protocol.

The only general question I have unanswered is: how Kopete is supposed to find out in which encoding groupnames are written? (for single contacts it's clear - server reports it).

Sadly, right now I havent more time to dig into this - work, work, work :). 
And anyway i know nothing about ICQ protocol - I'm just user :). All of the above was deduced by examining the code - maybe I mistaken.

Hope that helps somewho and somehow :).
And sorry for my English :).
Comment 22 Matt Rogers 2005-12-02 02:24:18 UTC
Dmitry, thank you soo much for looking at this bug and analyzing it! It will be much easier for me to write a fix now than before. However, because I'm limited on time, it might take me awhile to write a fix. If I get one written though, would you be willing to help me test it?
Comment 23 Dmitry Suzdalev 2005-12-05 21:34:30 UTC
Of course, Matt!
I'd like to see this bug fixed very much! :)
Feel free to contact me anytime.

I think, after fixing all of the "encoding bugs" like this you'll get a huge amount of new kopete users :).
At least from Russia :).
Comment 24 Dmitry Suzdalev 2005-12-24 15:10:40 UTC
Hi! Just wondering about progress of this bug :).
Matt, have you had any time to work on it?

TIA, Dmitry.
Comment 25 Matt Rogers 2006-01-13 02:08:29 UTC
SVN commit 497530 by mattr:

apply the encodings fix patch from Oleg Girko. Should fix nearly all the encoding problems
with OSCAR. This patch is for the 0.12 version of Kopete. A fix for the 0.11 series is forthcoming

Thanks for the patch!
CCMAIL: Oleg Girko <ol@infoserver.ru>
BUG: 109034
BUG: 112323
BUG: 79574



 M  +2 -2      aim/aimaccount.cpp  
 M  +22 -2     aim/aimcontact.cpp  
 M  +3 -1      icq/icqaccount.cpp  
 M  +61 -16    icq/icqcontact.cpp  
 M  +16 -9     icq/ui/icqsearchdialog.cpp  
 M  +32 -27    icq/ui/icquserinfowidget.cpp  
 M  +21 -10    liboscar/chatservicetask.cpp  
 M  +2 -0      liboscar/chatservicetask.h  
 M  +42 -7     liboscar/client.cpp  
 M  +10 -0     liboscar/client.h  
 M  +5 -5      liboscar/icquserinfo.cpp  
 M  +45 -45    liboscar/icquserinfo.h  
 M  +31 -35    liboscar/messagereceivertask.cpp  
 M  +1 -6      liboscar/offlinemessagestask.cpp  
 M  +118 -24   liboscar/oscarmessage.cpp  
 M  +28 -11    liboscar/oscarmessage.h  
 M  +96 -138   liboscar/sendmessagetask.cpp  
 M  +5 -4      liboscar/sendmessagetask.h  
 M  +5 -5      liboscar/usersearchtask.cpp  
 M  +43 -18    oscaraccount.cpp  
 M  +16 -0     oscaraccount.h  
 M  +9 -0      oscarcontact.cpp  
 M  +8 -1      oscarcontact.h  
Comment 26 Matt Rogers 2006-01-17 23:21:42 UTC
*** Bug 120349 has been marked as a duplicate of this bug. ***
Comment 27 S. Burmeister 2006-01-26 21:07:09 UTC
I am not 100% sure but since my svn-update around the time of this fix's commit umlauts display incorrectly for people they did display correctly before.

I am using latest SVN from today.
Comment 28 Bartosz Fabianowski 2006-01-26 23:42:46 UTC
I can confirm what was said in comment #27. I am constantly tracking the dev-0.12 branch and the commit to "fix OSCAR encodings once and for all" broke them for me big time.

Umlauts I receive from any ICQ client (official, Miranda, Trillian, ...) get corrputed. The only way to work around this I found so far is to manually force the encoding for that contact to "ISO-8859-1 Western". When I first open the encoding dialog for a contact, this is always the chosen default, but it is not what's actually applied. I have to change it to any other setting, hit "OK", then reopen the dialog, choose "ISO-8859-1 Western" again and hit "OK" for Kopete to remember this setting. Reminds me of Windows, where one often has to go through this routine to make a setting stick.

Unfortunately, while manually selecting "ISO-8859-1 Western" for the encoding fixes incoming umlauts, it doesn't fix outgoing ones. I have at least one contact where the other side is receiving nonsense characters from me. I tried changing the encoding to "UTF-8 Unicode", but that didn't help.

It's interesting to note that almost all my contacts are reported as UTF-8 capable in Kopete's tooltip, but still the UTF-8 setting does not work (and neither does any other one, really...).
Comment 29 Bartosz Fabianowski 2006-01-26 23:48:29 UTC
One more thing I forgot - I think it's also since the encoding patch went in that some messages (might be offline messages only, but I am not sure) have an additional non-printable character appended. It's encoded in Kopete's history log as something like &#0; - which would indicate that a trailing zero character that should end the string doesn't get stripped.
Comment 30 Bartosz Fabianowski 2006-01-30 02:47:37 UTC
I investigated a bit further. The contact who is receiving broken umlauts from me is using Trillian Pro 3.1. His client is advertising UTF-8 capability, but all (online) messages he sends to me are Latin-1 encoded. Therefore, I manually set the encoding for this contact to "ISO-8859-1 Western". Kopete correctly decodes his messages now.

But *outgoing* online messages get UTF-8 encoded. Even though I specifically set a custom encoding and I know Kopete registered that setting. This definitely seems like a bug to me. Trillian on the other side should still be able to cope with those messages if they were correctly marked as such. So it seems as though Kopete is sending UTF-8 instead of Latin-1 *and* failing to mark the messages as UTF-8.
Comment 31 Matt Rogers 2006-01-30 17:27:27 UTC
Could you file a new bug for the issue in the second paragraph? It's a
genuine problem that deserves its own report rather than getting lost in
the shuffle of this bug report.

Thanks. :)
Comment 32 S. Burmeister 2006-01-30 17:48:13 UTC
Yet the bug described in the second paragraph did only arise after the patch for this bug was applied, so the patch caused a regression and should be reverted until it is fixed and does not cause any side-effetcts.
Comment 33 Bartosz Fabianowski 2006-01-30 18:53:21 UTC
I filed bug 121053 for the manually selected encoding being ignored in outgoing messages. Of course, this is a regression from commit 497530 but maybe rather than reverse the entire commit it's better to iron out its bugs one by one. After all, the commit certainly had its purpose and probably fixed some other issues.
Comment 34 Matt Rogers 2006-01-30 19:48:08 UTC
Sven: this bug is fixed. the contact's encoding is applied to his
user info now. Other bugs that arise as a result of the patch that was
committed to fix this issue (encoding not applied to contact info)
should be submitted as new bugs, since they don't relate to this issue.
Comment 35 Matt Rogers 2006-01-30 19:52:09 UTC
Bartosz: thanks for creating the new bug. The same revision that fixed
this bug was also supposed to fix two other bugs, so it is indeed better
to iron issues with the new code one by one rather than doing a flat
revert. I don't have statistics on me, but the commit was very large,
and fundamentally changed the way encodings are handled, so I have to
expect some new bugs. :)
Comment 36 Dmitry Suzdalev 2006-02-02 19:19:03 UTC
One thing remains though...
Encoding of contact list groups. It's still broken.
I created a group in sim and wrote its name in Russian letters.
Then I launched kopete (svn dev-0.12 version) and group name is displayed with nonsense characters.
May be I can try to fix this bug, but I just dunno where to look for encoding in this case? For user info and messages its clear - encoding is taken from contact encoding dialog. And what about groups?
Btw, Matt, should I file separate bug for this issue? Or maybe there is one already?
Comment 37 Matt Rogers 2006-02-02 22:03:54 UTC
seperate bug for the messed up group names please. :)