Bug 67727 - String encoding not detected properly
Summary: String encoding not detected properly
Status: RESOLVED FIXED
Alias: None
Product: kopete
Classification: Applications
Component: ICQ and AIM Plugins (show other bugs)
Version: unspecified
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Kopete Developers
URL:
Keywords:
: 66517 68157 69545 70639 71068 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-11-10 00:01 UTC by Florian Evers
Modified: 2004-12-12 19:09 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
possible patch (599 bytes, patch)
2004-01-21 05:55 UTC, Matt Rogers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Evers 2003-11-10 00:01:40 UTC
Version:           0.7.93 (using KDE KDE 3.1.93)
Installed from:    SuSE RPMs
OS:          Linux

This bug is triggered when some clients from friends of mine send me messages with german umlauts. A simple message with two characters ("üü") (These are two ü in HTML in case that they get lost in these webforms). 

I tested it with three friends, and in two cases it causes an error:

LICQ, Version from CVS (don't know the version-number)
-> No problems. I could read the "üü".

Linux: GnomeICU V 0.98.2
and
Windows: Miranda 0.3.1
-> Kopete has problems!


Message to xxxxxxxxx at 23:42:41
An internal Kopete error occurred while parsing a message:
XML document could not be parsed!

(I don't know why there is a "to friend", it should be "from friend" here... another bug?)

The shell-output:

florian@powerstation:~> kopete
florian@powerstation:~> QDict: Cannot insert null item
Entity: line 8: error: Input is not proper UTF-8, indicate encoding !
 <body bgcolor="#ffffff" color="#000000" ><![CDATA[üü]]></body>
                                                   ^
Entity: line 8: error: Bytes: 0xFC 0xFC 0x5D 0x5D
 <body bgcolor="#ffffff" color="#000000" ><![CDATA[üü]]></body>
                                                   ^

The carets are directly under the first "ü".
My system is a SuSE-Linux 8.2 with KDE3.2beta.
Comment 1 Florian Evers 2003-11-10 00:29:00 UTC
We (friend with miranda) tried more messages with umlauts:

The following umlaut-messages caused no problems (always only the two letters)
ää   - &auml;&auml;
ÄÄ   - &Auml;&Auml;
ÖÖ   - &Ouml;&Ouml;
ÜÜ   - &Uuml;&uuml;
+ all single umlauts.

The following umlaut-messages triggered the problem:
üü   - &uuml;&uuml;
öö   - &ouml;&ouml;
Comment 2 Thiago Macieira 2003-11-10 02:42:39 UTC
I believe Stefan will come here and tell you this is not a bug. But anyways, let me tell you I've been seeing this happen quite frequently as well. However, this has only started happening when I changed my locale to pt_BR.UTF-8. I'm now running under a Latin1 locale and will report in a few days if the problem has disappeared.

And now as to why this isn't a bug: some ICQ messages don't come with a tag indicating which encoding was used to send a message. By watching the ICQ kdDebug outputs, it seems that those messages that fail are exactly those that don't come with an encoding tag.

When Kopete receives such a message, it tries to decode using the user's locale (which is appropriate for such locales as Russian, Hebrew, Chinese, Japanese, etc.) and is also appropriate when you're talking to a buddy in the same locale. However, what is happening is that you're (I am) in UTF-8 locale while your buddy isn't. He's sending you Latin-1 messages and you're trying to decode as UTF-8. It won't work.

I hope this is a correct assessment.
Comment 3 Jason Keirstead 2003-11-10 02:53:59 UTC
You can set an ICQ contact's encoding in their contact properties.

Not sure if this will help.
Comment 4 Florian Evers 2003-11-10 14:53:52 UTC
Hi!

@Thiago Macieira:
Ok, this sounds logically. But when I look at my locale:

florian@powerstation:~> locale
LANG=de_DE@euro
LC_CTYPE="de_DE@euro"
LC_NUMERIC="de_DE@euro"
LC_TIME="de_DE@euro"
LC_COLLATE=POSIX
LC_MONETARY="de_DE@euro"
LC_MESSAGES="de_DE@euro"
LC_PAPER="de_DE@euro"
LC_NAME="de_DE@euro"
LC_ADDRESS="de_DE@euro"
LC_TELEPHONE="de_DE@euro"
LC_MEASUREMENT="de_DE@euro"
LC_IDENTIFICATION="de_DE@euro"
LC_ALL=

All I can see is "de_DE@euro", and this is IMHO equal to ISO8859-15 (Latin-1 with euro-sign).
My question now is why does Kopete use UTF-8 in my configuration? Or do you mean a different place with settings for localisation instead of "locale"? ;-)

@Jason Keirstead:
Thanks, that worked. I switched those "problematic users" *g* to ISO8859-15 and it now I am able to receive those umlauts-messages. But it's a quite unpractical solution, because "the normal user" like me will not be able to solve this without some hints. Any workarounds possible? :-)

I am wondering a little bit why some of the umlauts were recognized correctly and the others were not.
Comment 5 Jason Keirstead 2003-11-10 15:03:53 UTC
Subject: Re: [Kopete-devel]  Double umlauts trigger internal error: XML document could not be parsed!

On November 10, 2003 9:53 am, Florian Evers wrote:
> All I can see is "de_DE@euro", and this is IMHO equal to ISO8859-15
> (Latin-1 with euro-sign). My question now is why does Kopete use UTF-8 in
> my configuration? Or do you mean a different place with settings for
> localisation instead of "locale"? ;-)

The locale of the incoming message is not determined by you, it is determined 
by the sending client.

> @Jason Keirstead:
> Thanks, that worked. I switched those "problematic users" *g* to ISO8859-15
> and it now I am able to receive those umlauts-messages. But it's a quite
> unpractical solution, because "the normal user" like me will not be able to
> solve this without some hints. Any workarounds possible? :-)

Sure, tell your friends to using a broken client.

The problem is the person's client is not sending the message encoding along, 
so Kopete has to either guess the encoding (by using your local one), or use 
the encoding specified in the contact config pages.

At least that is how I understand it. Stefan can correct me if I'm wrong.

Since this fixed the problem I highly suspect this is  a non-bug, but I'll 
leave that up to Stefan as well.

Comment 6 Stefan Gehn 2003-11-10 19:03:31 UTC
Without trying my own way of guessing encodings I cannot fix this (and I doubt I can make it any better than the qt folks).
I try some qt ways for decoding strings in case the incoming message has no encoding set, if that fails it's up to the users to fix this (i.e. set a per-contact encoding).
If both sides use a proper client that does utf-8 (miranda, licq and trillian don't, sim, mirabilis icq, micq and kopete do) you should have no problems even with weird stuff like the euro symbol ;)
Comment 7 Thiago Macieira 2003-11-10 21:51:37 UTC
Stefan: can't you revert to Latin-1 if there is no encoding marker and UTF-8 decoding fails? It's better to see a message with odd characters than to see "Internal Kopete Error".

You may want to Local8Bit decode first, but it may fail as well (my case).
Comment 8 Martijn Klingens 2003-11-10 22:02:36 UTC
Subject: Re: [Kopete-devel]  Double umlauts trigger internal error: XML document could not be parsed!

On Monday 10 November 2003 21:51, Thiago Macieira wrote:
> Stefan: can't you revert to Latin-1 if there is no encoding marker and
> UTF-8 decoding fails?

How can Stefan find that out???

> It's better to see a message with odd characters than to see "Internal
> Kopete Error". 

Well, that's libkopete, not the ICQ plugin, and it only reports this because 
libxml chokes on its input.

Also, isn't it QString::fromXxx's responsibility to at least make sure the 
resulting QString is valid utf8? Not correctly displayed, ok, but not utf8???

Comment 9 Thiago Macieira 2003-11-11 00:48:03 UTC
Ok, correct me if I'm wrong here: XML input must be UTF-8, therefore the Kopete plugins must first convert from whatever encoding the message comes in to Unicode (QString's internal representation), then generate an UTF-8 representation that can be fed to libxml. Is that correct?

So the plugins use QTextCodec to decode the 8-bit encodings into proper Unicode. If a decoding fails, another codec is tried. The failure is can be detected if the round-trip (8-bit to unicode to 8-bit) fails to produce the original string.

That way, we can detect if a decoding fails and fall-back to, for instance, user configuration, locale, then Latin1 (Latin1 can't fail).

However, since TT introduced a fix for broken UTF-8 filenames into Qt, they've changed their UTF-8 en/decoding procedures so that invalid sequences can be regenerated from Unicode data (they're using UTF-16 surrogates into the Plane-1 user range).

That means that, while we fix the problem of filenames being able to be loaded, invalid UTF-8 sequences can be generated from valid Unicode data. This could be the reason libxml is choking.
Comment 10 Martijn Klingens 2003-11-11 09:45:23 UTC
Subject: Re: [Kopete-devel]  Double umlauts trigger internal error: XML document could not be parsed!

On Tuesday 11 November 2003 00:48, Thiago Macieira wrote:
> Ok, correct me if I'm wrong here: XML input must be UTF-8, therefore the
> Kopete plugins must first convert from whatever encoding the message comes
> in to Unicode (QString's internal representation), then generate an UTF-8
> representation that can be fed to libxml. Is that correct?

Yes.

> So the plugins use QTextCodec to decode the 8-bit encodings into proper
> Unicode. If a decoding fails, another codec is tried. The failure is can be
> detected if the round-trip (8-bit to unicode to 8-bit) fails to produce the
> original string.

Either that or they use QString::fromXxxx. I know for sure MSN uses the 
latter, but as MSN is an all-utf8 protocol so it is not really of importance 
here. I don't know what Oscar uses.

> That means that, while we fix the problem of filenames being able to be
> loaded, invalid UTF-8 sequences can be generated from valid Unicode data.
> This could be the reason libxml is choking.

Ugh... Is there also a way to fix this?

If all else fails on the QString side of things we could remove illegal 
characters from the .utf8 data, but that's a pain to do.

Comment 11 Stefan Gehn 2003-11-11 10:03:01 UTC
Instead of this useless talk somebody should finally reproduce this with 
#define CHARSET_DEBUG 1
uncommented in oscardebug.h and post a log. As I said, I'm not going to do anything else than using QTextCodec, if that one is broken or misinterprets data then it's not my fault.
Comment 12 Thiago Macieira 2003-11-11 23:46:01 UTC
Here you go:

kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for US-ASCII=90, message length=92
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for UTF-8=-1, message length=92
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Couldn't find suitable encoding for incoming message, encoding using local system-encoding, TODO: sane fallback?
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Decoding using codec 'UTF-8'
kopete (oscar): [void OscarSocket::parseMessage(const UserInfo&, OscarMessage&, unsigned cha<br/>Aqui tudo bem. Terminado por hoje na esta�o. Estou indo para o hotel.
kopete (oscar): [void OscarAccount::slotReceivedMessage(const QString&, OscarMessage&, OscarSocket::OscarMessageType)] account='1967141', type=0, sender='137740436'
kopete (oscar/aim): [void OscarContact::receivedIM(KopeteMessage&)] called
Entity: line 14: error: Input is not proper UTF-8, indicate encoding !
<br/>Aqui tudo bem. Terminado por hoje na esta�o. Estou indo para o hotel.]]></
                                              ^
Entity: line 14: error: Bytes: 0xE7 0xE3 0x6F 0x2E
<br/>Aqui tudo bem. Terminado por hoje na esta�o. Estou indo para o hotel.]]></

Note that it tries UTF-8 twice: once for the UTF-8 in code and once for the locale. As I had said, sometimes the locale decoding can fail. There should be another fallback to Latin1, which can never fail.
Comment 13 Jason Keirstead 2003-11-12 04:34:03 UTC
Anything can "fail". The XML sent to the XSL transform function can't have control characters in it. Anything producing said control characters, including a latin1 encoded string with invalid latin1 in it, will generate a parse error.

Sometimes even valid latin1 will fail, for example, raw IRC with color codes would produce a parse error because it uses low range ASCII control characters like 0x0? to denote its codes.
Comment 14 Stefan Gehn 2003-11-12 15:53:59 UTC
> As I had said, sometimes the locale decoding can fail. There should be
> another fallback to Latin1, which can never fail. 

US-ASCII in case of QT _is_ Latin1, if you use the US-ASCII MIB and ask QT for a codec you get an iso-8859-1 codec which is (almost) the same as latin1.

I doubt I can do anything about such an incoming string, tell the other side to use a proper client or find out about the other sides local encoding and set that for that KopeteContact.
Comment 15 Olivier Goffart 2003-11-12 16:38:19 UTC
*** Bug 66517 has been marked as a duplicate of this bug. ***
Comment 16 Oswald Buddenhagen 2003-11-18 01:58:12 UTC
i know that encoding detection magic exists in konq (or khtml) - that code could be simply copied or generalized and added to KStringHandler (where i already added isUtf8() recently).
Comment 17 Thiago Macieira 2003-11-20 04:44:19 UTC
Ok, a new follow up on the subject. I got this debugging output while receiving a message from an ICQ user that got replaced by the "XML error" message:

kopete (oscar): [void OscarSocket::parseSimpleIM(Buffer&, const UserInfo&)] RECV TYPE-1 IM from '60406059'
kopete (oscar): [void OscarSocket::parseSimpleIM(Buffer&, const UserInfo&)] TLV(2)
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for US-ASCII=7, message length=15
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for UTF-8=-1, message length=15
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Couldn't find suitable encoding for incoming message, encoding using local system-encoding, TODO: sane fallback?
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Decoding using codec 'UTF-8'
kopete (oscar): [void OscarSocket::parseMessage(const UserInfo&, OscarMessage&, unsigned char, unsigned char)] Got a normal message: ��������
kopete (oscar): [void OscarAccount::slotReceivedMessage(const QString&, OscarMessage&, OscarSocket::OscarMessageType)] account='1967141', type=0, sender='60406059'
kopete (oscar/aim): [void OscarContact::receivedIM(KopeteMessage&)] called
Entity: line 14: error: Input is not proper UTF-8, indicate encoding !
 <body dir="ltr" ><![CDATA[��������]></body>

and:
kopete (oscar): [void OscarSocket::parseSimpleIM(Buffer&, const UserInfo&)] RECV TYPE-1 IM from '60406059'
kopete (oscar): [void OscarSocket::parseSimpleIM(Buffer&, const UserInfo&)] TLV(2)
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for US-ASCII=76, message length=78
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for UTF-8=-1, message length=78
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Couldn't find suitable encoding for incoming message, encoding using local system-encoding, TODO: sane fallback?
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Decoding using codec 'UTF-8'
kopete (oscar): [void OscarSocket::parseMessage(const UserInfo&, OscarMessage&, unsigned char, unsigned char)] Got a normal message: Outro bug..<br/>Aqui quando eu fa� alt+tab, tua janela volta no topo do hist�ico
kopete (oscar): [void OscarAccount::slotReceivedMessage(const QString&, OscarMessage&, OscarSocket::OscarMessageType)] account='1967141', type=0, sender='60406059'
kopete (oscar/aim): [void OscarContact::receivedIM(KopeteMessage&)] called
Entity: line 14: error: Input is not proper UTF-8, indicate encoding !
 <body dir="ltr" ><![CDATA[Outro bug..<br/>Aqui quando eu fa� alt+tab, tua jane

The interesting turn of events: the other side is using Kopete (recent HEAD). I then started a rebuild myself, but I couldn't test with him again.

I tested with ICQ-official 2002a and the decoding went fine ("FORCED UTF-8" messages).
Comment 18 Thiago Macieira 2003-11-20 04:51:22 UTC
Talking to a ICQ 2001b client, receiving a message with more than one non-ASCII character:
kopete (oscar): [void OscarSocket::parseIM(Buffer&)] IM received on channel 2 from '137740436'
kopete (oscar): [void OscarSocket::parseIM(Buffer&)] The first TLV is of type 5
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] RECV TYPE-2 message
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] msgType=1, msgFlags=0, status=0, priority=33
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] RECV TYPE-2 IM, normal/auto message
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] fg color=(0, 0, 0, 0)
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] bg color=(255, 255, 255, 0)
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for US-ASCII=45, message length=49
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for UTF-8=-1, message length=49
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Couldn't find suitable encoding for incoming message, encoding using local system-encoding, TODO: sane fallback?
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Decoding using codec 'UTF-8'
kopete (oscar): [void OscarSocket::parseMessage(const UserInfo&, OscarMessage&, unsigned char, unsigned char)] Got a normal message: n� h�campe�s �altura (til, agudo, til, crase)

However, when receiving only one non-ASCII char:
kopete (oscar): [void OscarSocket::parseIM(Buffer&)] IM received on channel 2 from '137740436'
kopete (oscar): [void OscarSocket::parseIM(Buffer&)] The first TLV is of type 5
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] RECV TYPE-2 message
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] msgType=1, msgFlags=0, status=0, priority=33
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] RECV TYPE-2 IM, normal/auto message
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] fg color=(0, 0, 0, 0)
kopete (oscar): [void OscarSocket::parseAdvanceMessage(Buffer&, UserInfo&, Buffer&)] bg color=(255, 255, 255, 0)
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] result for US-ASCII=4, message length=5
kopete (oscar): [const QString OscarSocket::ServerToQString(const char*, OscarContact*, bool)] Decoding using codec 'ISO 8859-1'
kopete (oscar): [void OscarSocket::parseMessage(const UserInfo&, OscarMessage&, unsigned char, unsigned char)] Got a normal message: único
kopete (oscar): [void OscarAccount::slotReceivedMessage(const QString&, OscarMessage&, OscarSocket::OscarMessageType)] account='1967141', type=0, sender='137740436'

with:
kopete (oscar): [const DWORD OscarSocket::parseCapabilities(Buffer&)] CAPS: AIM_CAPS_ICQSERVERRELAY AIM_CAPS_RTFMSGS AIM_CAPS_IS_2001 AIM_CAPS_ISICQ
Comment 19 Stefan Gehn 2003-11-20 09:33:00 UTC
> kopete (oscar): [const DWORD OscarSocket::parseCapabilities(Buffer&)] CAPS:
> AIM_CAPS_ICQSERVERRELAY AIM_CAPS_RTFMSGS AIM_CAPS_IS_2001 AIM_CAPS_ISICQ 

Looks like something's wrong with the UTF-cap as all Mirabilis clients also send that cap.
Comment 20 Stefan Gehn 2003-11-25 10:14:20 UTC
Actually I cannot reproduce this here anymore, probably caused by my removal of testing for utf8. QT 3.2.x is so badly broken in that respect, I'm not going to try guessing encodings anymore.
I'll see how much I can backport from this as my local changes also include per-account encodings (which I cannot commit due to string freeze).
Comment 21 Stefan Gehn 2003-11-25 10:16:08 UTC
*** Bug 68157 has been marked as a duplicate of this bug. ***
Comment 22 Jason Keirstead 2003-12-02 05:43:51 UTC

*** This bug has been marked as a duplicate of 57129 ***
Comment 23 Jason Keirstead 2003-12-02 05:44:29 UTC
Oops resovlved wrong bug! re-opening...
Comment 24 Geoff Nenn 2003-12-19 12:40:12 UTC
I have been getting similar problems, too, when ysing the Yahoo! plugin. I am using the Kopete beta 0.7.93 version from SuSE 8.2 RPMs with KDE 3.1.1 and gcc 3.3.20030226, and kopete generally works fine. However, the problem I have which generates the XML parsing error seems to be related to font colour and means the incoming text is not visible. 

My (outgoing) text colour was/still is set to the default black, but now strangely appears multi-coloured after a couple of chats with someone who had their text set to "multi-coloured". They have now reset the font colour to black, too. However, this has had no effect at my end, where the font colour is still multi-coloured.I frequently get the following (FYI,my locale=en-US, they are on XP English):

>>SNIP
kopete: got IM
kopete: [QColor YahooAccount::getMsgColor(const QString&)] msg is 000035mOh Dear, Oh Dear, Oh Dear,
I have just read about even more security glitches. Dear Old Uncle Bill!
kopete: Custom color is #000035
kopete: [void ChatView::placeMembersList(KDockWidget::DockPosition)] Members list policy 0, visible false
kopete: [void ChatView::placeMembersList(KDockWidget::DockPosition)] Members list policy 0, visible false
Entity: line 8: error: CData section not finished
 <body color="#000035" ><![CDATA[000035mOh Dear, Oh Dear, Oh Dear,
                                 ^
Entity: line 8: error: detected an error in element content
 <body color="#000035" ><![CDATA[000035mOh Dear, Oh Dear, Oh Dear,
                                 ^
Entity: line 8: error: Premature end of data in tag body
 <body color="#000035" ><![CDATA[000035mOh Dear, Oh Dear, Oh Dear,
                                 ^
Entity: line 8: error: detected an error in element content
 <body color="#000035" ><![CDATA[000035mOh Dear, Oh Dear, Oh Dear,
                                 ^
Entity: line 8: error: Premature end of data in tag message
 <body color="#000035" ><![CDATA[000035mOh Dear, Oh Dear, Oh Dear,
                                 ^
Entity: line 8: error: Extra content at the end of the document
 <body color="#000035" ><![CDATA[000035mOh Dear, Oh Dear, Oh Dear,
                                 ^
kopete: [void KopeteSystemTray::addBalloon()] [Null pointer]:true:true:true
kopete: [void KopeteSystemTray::addBalloon()] Orig msg text=000035mOh Dear, Oh Dear, Oh Dear,
I have just read about even more security glitches. Dear Old Uncle Bill!
kopete: [void KopeteSystemTray::addBalloon()] New msg text=000035mOh Dear, Oh Dear, Oh ...
kopete: [void KopeteSystemTray::startBlink(const QMovie&)] starting movie.
kopete: [void YahooSession::slotReadReady()] Socket FD: 11
kopete: KopeteEvent::apply
kopete: KopeteEvent::~KopeteEvent
kopete: KopeteEvent::apply
kopete: [void KopeteSystemTray::stopBlink()] stopping movie.
kopete: [void YahooContact::slotSendMessage(KopeteMessage&)]
kopete: Sending message: can't see anything.. .hold on..
kopete: [void YahooSession::slotReadReady()] Socket FD: 11
Comment 25 Geoff Nenn 2003-12-19 13:09:25 UTC
Addendum: This problem at least from my perspective seems to be confined to the Yahoo! plugin. The MSN plugin seems to handle font colours better, though I still have the multi-coloured outgoing text problem, and does not display XML parsing errors... 

MSN Plugin OUTgoing msg:

kopete: MSNSocket::slotReadyWrite: Sending command: MSG 142 A 273
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
User-Agent: Kopete/0.7.93
X-MMS-IM-Format: FN=MS%20Serif; EF=; CO=0000dd; CS=0; PF=0
I know, listen, I will leave you to it, ok?

...

MSN Plugin INcoming msg:

kopete: MSNSocket:slotDataReceived: MSG ***@hotmail.com *** 157
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-MMS-IM-Format: FN=Times%20New%20Roman; EF=I; CO=0; CS=0; PF=12

okey dokey .. take care - bye
kopete: MSNSocket::slotReadyWrite: Sending command: MSG 143 U 95
Comment 26 Matt Rogers 2003-12-19 18:34:45 UTC
*** Bug 70639 has been marked as a duplicate of this bug. ***
Comment 27 Matt Rogers 2004-01-16 02:49:57 UTC
*** Bug 69545 has been marked as a duplicate of this bug. ***
Comment 28 Stefan Gehn 2004-01-18 14:14:02 UTC
*** Bug 71068 has been marked as a duplicate of this bug. ***
Comment 29 Stefan Gehn 2004-01-18 14:15:23 UTC
Changing topic to something more general, this affects both conversations and any string sent/recevied from/to the server (i.e. userinfo, nicknames, etc.).
Comment 30 Matt Rogers 2004-01-21 05:55:58 UTC
Created attachment 4268 [details]
possible patch

Please test this patch. I don't know if it makes things better or not, but it
basically makes it all default to US-ASCII for everything.
Comment 31 Stefan Gehn 2004-01-21 14:43:06 UTC
This patch is nonsense, why do you test for (length < 0)??? That's length for the incoming message.
Comment 32 Stefan Gehn 2004-01-21 17:24:27 UTC
Due to the fact that there is _no_ safe way to guess a texts encoding I'll close this one now. Also this bug starts to get messed up with unrelated comments about other protocols and unrelated things like colors or fonts (open new bugreports for the right protocol if there's really a bug).

For problems with incoming text in ICQ there's the encoding selection in ICQ-userinfo dialog. A per-account selection for the fallback-encoding will be added soon as well. Other than that I see little to improve the situation (look at sim-icq or licq, they also have an encoding selection).

Eventually Kopete should be a bit more verbose if displaying a message failed (or point to a to-be-written chapter in our to-be-written help *g*).
Comment 33 Norberto Bensa 2004-01-21 19:19:58 UTC
Subject: Re:  String encoding not detected properly

Stefan Gehn wrote:
> For problems with incoming text in ICQ there's the encoding selection in
> ICQ-userinfo dialog. A per-account selection for the fallback-encoding will
> be added soon as well.

What about a global fallback-encoding option?

Also, the encoding per-user (in 0.8rc1) doesn't remember its setting 
(iso8859-1 in my case.)

Regards,
Norberto

Comment 34 Stefan Gehn 2004-01-21 19:41:07 UTC
Please DON'T CC/BCC me, it just adds confusion and doubles my mail amount.
Pasting what I wrote Norberto in PM:

> What about a global fallback-encoding option?

Impossible because the character encoding is done by every protocol, there's 
no kopete-wide system for it because protocols differ way too much in this 
respect. Per-account is not that hard, you only have to change it once per 
account, an action that you don't have to repeat every few days, no? :)

I know that it's annoying to set it for every contact, that's why I will add 
the per-account option.

> Also, the encoding per-user (in 0.8rc1) doesn't remember its setting
> (iso8859-1 in my case.)

Hmm, it's remembered for me, try saving windows codepage 1252 because that's 
actually the only setting I ever needed and tested :/

If you have continuing problems with saving the setting just bug me again and 
I'll see if I can send you a patch with debugging output so we can track it 
down.
Comment 35 Norberto Bensa 2004-01-22 01:14:00 UTC
Subject: Re:  String encoding not detected properly

Stefan Gehn wrote:
> Please DON'T CC/BCC me, it just adds confusion and doubles my mail amount.

Oops, sorry. I've learned the lesson :-)

> Pasting what I wrote Norberto in PM:
> > What about a global fallback-encoding option?
>
> Impossible because the character encoding is done by every protocol,

I didn't meant for "all" the protocols, only a global fullback encoding for 
ICQ (I don't use AIM so I don't know if this "bug" affects Oscar too)

> > Also, the encoding per-user (in 0.8rc1) doesn't remember its setting
> > (iso8859-1 in my case.)
>
> Hmm, it's remembered for me, try saving windows codepage 1252 because
> that's actually the only setting I ever needed and tested :/

OK, I've tried 1252 (Windows 1252 Western) and it does remember now.

Many Thanks,
Norberto

Comment 36 Matt Rogers 2004-01-28 22:54:49 UTC
*** Bug 73715 has been marked as a duplicate of this bug. ***
Comment 37 Christopher Martin 2004-04-28 15:29:56 UTC
This has been reported to Debian as Bug #246310. This bug should be re-opened pending some sort of ultimate resolution, even if that takes a while, since it affects many many people. At the very least having it open would make it easier for people to find this information.

Thanks,
Christopher Martin
Comment 38 Matt Rogers 2004-04-28 15:37:12 UTC
see bug 75497

Comment 39 Christopher Martin 2004-04-28 16:16:40 UTC
Ah, thanks. The information in this bug report has not been completely duplicated in 75497, though, so maybe I'll just link to it from there.
Comment 40 Kay Patzwald 2004-12-12 19:09:03 UTC
Is this bug been fixed? I use Kopete 9.2 and still have the problem since many versions of Kopete. I choose the encoding per user but it has no effect.