Summary: | [icq] encoding translation not applied to the contact's information | ||
---|---|---|---|
Product: | [Unmaintained] kopete | Reporter: | mi+kde |
Component: | ICQ and AIM Plugins | Assignee: | Kopete Developers <kopete-bugs-null> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | j, jindrich, michal, mss, passnet, sven.burmeister, tre1 |
Priority: | NOR | ||
Version: | 0.8.1 | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | FreeBSD | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
mi+kde
2004-04-13 18:30:30 UTC
I think I know what's causing this. on second thought, no, I don't until I know how to delay text decoding till data arrives @ oscaraccount I cannot do much about this. Specifically: can I store utf-8, utf-16 in a QCString? If yes then I can redesign the whole procedure so userinfo gets decoded when it reaches the contact object (so far we do it right after receiving the packet, we do not know which contact the packet refers to though). *** Bug 73715 has been marked as a duplicate of this bug. *** Hi, Stefan. I'm sorry, but I can not understand your comment. If the other user's information can be displayed _at all_, it can also be interpreted in the correct encoding. If the server does not specify, which encoding the other user used to enter his information, the software should give me the chance to chose. If the encoding is specified by the server (or is known in advance to be UTF), than there is no problem at all, is there? Thanks. Yours, -mi *** Bug 95927 has been marked as a duplicate of this bug. *** I can select encoding at user details page, so that encoding could be used for it. Or am I missing something? that encoding only applied to messages from the user. I have made some tests and found, that (for russian users) the user information usually is sent in CP1251 encoding. The reason is that CP1251 is russian encoding of Windows (and ICQ). The corresponding code in Kopete takes this code as latin-encoded (as I can see from CVS). No conversion is done. Probably the best way will be to allow to specify the encoding-1 for profile (and to store it somewhere), and encoding-2 for the decoding messages from users. What is the current state of the encoding-specific code? Is it completely absent or just isn't merged to the main CVS (latest CVS-build of Kopete doesn't have even a widget to select encoding in user ICQ information). Is this comments thread the best way for discussing this item? Best regards. currently, other than for messages, we just assume latin-1 encoding for things like user info. I haven't had the chance to implement better support for encodings in user info yet. It's definitely NOT latin-1, if it would be, I couldn't get chinesse (or something like that) symbol in user name :-) I got (let's see how will bugzilla handle it): D釟a I've had this problem with these (often chinese) symbols instead of German umlauts both in ICQ messages and user infos using kopete 0.9.x till I compiled today's cvs. There are still some encoding issues with "using alias from server" but I don't know wether this is caused by my old 0.9.x contactlist.xml with wrongly encoded values. I'll correct this file manually and report if problems persist. ICQ plugin-data-field key="ssi_alias" is reverted to wrong encoding after correcting it manually in contactlist.xml. But I don't see this key really used anywhere? the ssi_alias key does get used. See ICQProtocol::deserializeContact IMHO ssi_alias isn't deserialized in ICQProtocol::deserializeContact. Oscar::SSI::checkTLVs gets the alias and OscarContact::serialize "saves" it. I've got a contact's alias stored in Utf8 on server: kopete (oscar - raw protocol): [Oscar::TLV Buffer::getTLV()] TLV data is [ffffffffffffffc2 ffffffffffffffb0 ffffffffffffffc2 ffffffffffffffba 20 xx xx xx 20 ffffffffffffffc2 ffffffffffffffba ffffffffffffffc2 ffffffffffffffb0 ] ... kopete (oscar - raw protocol): [void Oscar::SSI::checkTLVs()] Got an alias '°º xxx º°' for contact 'xxxxxxx' These Âs shouldn't be there. I'm having the same issues with Hebrew encodings. I'm not sure if it's any help, but SIM seems to do it all just fine. Maybe you can have a look at their code? Anybody going to fix this? This is still in place with 0.10.3, icq, russian. The same for me with german umlauts in usernames GAIM displays them fine, but I´d like to use Kopete please fix this for the next version!!!! *** Bug 116808 has been marked as a duplicate of this bug. *** I have this bug! And I don't want to have it :). Adding my votes ;). Thanks in advance for fixing it! Btw, this bug has more serious consequensies. Yesterday, I opened kopete (included in KDE-3.5.0) to try it out again. Message encoding is fixed, yes (and thanks!). But... I had the group of contacts with name encoded in Windows-CP1251 (russian ICQ encoding). Kopete displayed it as gibberish. I tried to rename it... Group name became readable in Russian, but it seems that Kopete had re-written UTF-encoded groupname to icq server, and server interpreted it as string in CP1251 encoding. Or smth like that. Result... After relaunching kopete all my contacts which were in that group were deleted from server (launched sim to check this), that group was displayed as "???????" and kopete asked me about bunch of accounts that are missing on server, but are saved in local copy of contactlist (if I understood correctly) :). Now about the code. Today I had time to view related sim and kopete code. What I found out was that sim stores user info in ICQUserInfo structure (it's filled by server). And there's field "Encoding" there. Corresponding SIM class ICQContact (IIRC - not sure about exact name, I'm writing from home) has functions toUnicode(str) and fromUnicode(str), which convert str to/from Unicode based on actual encoding of contact reported by server. I didn't find the code which uses this field in kopete (kopete has several ICQ*Info classes - i searhed in all of them). kopete's liboscar ignores the fact that encoding of UserInfo strings may differ from contact to contact while sending/receiving messages using ICQ protocol. The only general question I have unanswered is: how Kopete is supposed to find out in which encoding groupnames are written? (for single contacts it's clear - server reports it). Sadly, right now I havent more time to dig into this - work, work, work :). And anyway i know nothing about ICQ protocol - I'm just user :). All of the above was deduced by examining the code - maybe I mistaken. Hope that helps somewho and somehow :). And sorry for my English :). Dmitry, thank you soo much for looking at this bug and analyzing it! It will be much easier for me to write a fix now than before. However, because I'm limited on time, it might take me awhile to write a fix. If I get one written though, would you be willing to help me test it? Of course, Matt! I'd like to see this bug fixed very much! :) Feel free to contact me anytime. I think, after fixing all of the "encoding bugs" like this you'll get a huge amount of new kopete users :). At least from Russia :). Hi! Just wondering about progress of this bug :). Matt, have you had any time to work on it? TIA, Dmitry. SVN commit 497530 by mattr: apply the encodings fix patch from Oleg Girko. Should fix nearly all the encoding problems with OSCAR. This patch is for the 0.12 version of Kopete. A fix for the 0.11 series is forthcoming Thanks for the patch! CCMAIL: Oleg Girko <ol@infoserver.ru> BUG: 109034 BUG: 112323 BUG: 79574 M +2 -2 aim/aimaccount.cpp M +22 -2 aim/aimcontact.cpp M +3 -1 icq/icqaccount.cpp M +61 -16 icq/icqcontact.cpp M +16 -9 icq/ui/icqsearchdialog.cpp M +32 -27 icq/ui/icquserinfowidget.cpp M +21 -10 liboscar/chatservicetask.cpp M +2 -0 liboscar/chatservicetask.h M +42 -7 liboscar/client.cpp M +10 -0 liboscar/client.h M +5 -5 liboscar/icquserinfo.cpp M +45 -45 liboscar/icquserinfo.h M +31 -35 liboscar/messagereceivertask.cpp M +1 -6 liboscar/offlinemessagestask.cpp M +118 -24 liboscar/oscarmessage.cpp M +28 -11 liboscar/oscarmessage.h M +96 -138 liboscar/sendmessagetask.cpp M +5 -4 liboscar/sendmessagetask.h M +5 -5 liboscar/usersearchtask.cpp M +43 -18 oscaraccount.cpp M +16 -0 oscaraccount.h M +9 -0 oscarcontact.cpp M +8 -1 oscarcontact.h *** Bug 120349 has been marked as a duplicate of this bug. *** I am not 100% sure but since my svn-update around the time of this fix's commit umlauts display incorrectly for people they did display correctly before. I am using latest SVN from today. I can confirm what was said in comment #27. I am constantly tracking the dev-0.12 branch and the commit to "fix OSCAR encodings once and for all" broke them for me big time. Umlauts I receive from any ICQ client (official, Miranda, Trillian, ...) get corrputed. The only way to work around this I found so far is to manually force the encoding for that contact to "ISO-8859-1 Western". When I first open the encoding dialog for a contact, this is always the chosen default, but it is not what's actually applied. I have to change it to any other setting, hit "OK", then reopen the dialog, choose "ISO-8859-1 Western" again and hit "OK" for Kopete to remember this setting. Reminds me of Windows, where one often has to go through this routine to make a setting stick. Unfortunately, while manually selecting "ISO-8859-1 Western" for the encoding fixes incoming umlauts, it doesn't fix outgoing ones. I have at least one contact where the other side is receiving nonsense characters from me. I tried changing the encoding to "UTF-8 Unicode", but that didn't help. It's interesting to note that almost all my contacts are reported as UTF-8 capable in Kopete's tooltip, but still the UTF-8 setting does not work (and neither does any other one, really...). One more thing I forgot - I think it's also since the encoding patch went in that some messages (might be offline messages only, but I am not sure) have an additional non-printable character appended. It's encoded in Kopete's history log as something like � - which would indicate that a trailing zero character that should end the string doesn't get stripped. I investigated a bit further. The contact who is receiving broken umlauts from me is using Trillian Pro 3.1. His client is advertising UTF-8 capability, but all (online) messages he sends to me are Latin-1 encoded. Therefore, I manually set the encoding for this contact to "ISO-8859-1 Western". Kopete correctly decodes his messages now. But *outgoing* online messages get UTF-8 encoded. Even though I specifically set a custom encoding and I know Kopete registered that setting. This definitely seems like a bug to me. Trillian on the other side should still be able to cope with those messages if they were correctly marked as such. So it seems as though Kopete is sending UTF-8 instead of Latin-1 *and* failing to mark the messages as UTF-8. Could you file a new bug for the issue in the second paragraph? It's a genuine problem that deserves its own report rather than getting lost in the shuffle of this bug report. Thanks. :) Yet the bug described in the second paragraph did only arise after the patch for this bug was applied, so the patch caused a regression and should be reverted until it is fixed and does not cause any side-effetcts. I filed bug 121053 for the manually selected encoding being ignored in outgoing messages. Of course, this is a regression from commit 497530 but maybe rather than reverse the entire commit it's better to iron out its bugs one by one. After all, the commit certainly had its purpose and probably fixed some other issues. Sven: this bug is fixed. the contact's encoding is applied to his user info now. Other bugs that arise as a result of the patch that was committed to fix this issue (encoding not applied to contact info) should be submitted as new bugs, since they don't relate to this issue. Bartosz: thanks for creating the new bug. The same revision that fixed this bug was also supposed to fix two other bugs, so it is indeed better to iron issues with the new code one by one rather than doing a flat revert. I don't have statistics on me, but the commit was very large, and fundamentally changed the way encodings are handled, so I have to expect some new bugs. :) One thing remains though... Encoding of contact list groups. It's still broken. I created a group in sim and wrote its name in Russian letters. Then I launched kopete (svn dev-0.12 version) and group name is displayed with nonsense characters. May be I can try to fix this bug, but I just dunno where to look for encoding in this case? For user info and messages its clear - encoding is taken from contact encoding dialog. And what about groups? Btw, Matt, should I file separate bug for this issue? Or maybe there is one already? seperate bug for the messed up group names please. :) |