Bug 225747

Summary: kopete does not encode utf xml properly when sending to jabber (xmpp) server
Product: [Unmaintained] kopete Reporter: Nikoli <nikoli>
Component: Jabber PluginAssignee: Kopete Developers <kopete-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: charles, pali.rohar
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In: 16.08.1
Sentry Crash Report:

Description Nikoli 2010-02-06 18:08:05 UTC
Version:           qt 4.5.3 kde 4.3.3 kopete 0.80.2 (using KDE 4.3.3)
OS:                Linux
Installed from:    Gentoo Packages

Try to send '
Comment 1 Nikoli 2010-02-06 18:11:17 UTC
Try to send 'GUTISK' (215 	Gothic 	'GUTISK', bugs.kde.org filters this strings) to any jid using xmpp aka jabber. Server will disconnect you because utf is not encoded well.

I used copypasting http://meta.wikimedia.org/wiki/List_of_Wikipedias for test.

This problem does not have psi, qutim, qip, but does have vacuum.
Comment 2 Roman Jarosz 2010-02-06 18:33:22 UTC
I cannot reproduce this on KDE SC 4.4, I've tried to copy whole "Languages:" box ... what exactly did you send and which jabber server do you use?
Comment 3 Nikoli 2010-02-06 18:46:12 UTC
I opened http://meta.wikimedia.org/wiki/List_of_Wikipedias in firefox, did Ctrl+A, Ctrl+Insert, opened kopete, did Shift+Insert. I think, that problem is only with gutisk http://dpaste.com/155471/ http://paste2.org/p/653016

I used my ejabberd, jabber.org, jabber.ru. All servers do not like message with this text from kopete.
Comment 4 Nikoli 2010-03-16 22:52:35 UTC
Tested with latest Qt and KDE - same problem. Versions: Qt 4.6.2, KDE 4.4.1, kopete 1.0.0

jabber.ru disconnected client, xml log:

<message type="chat" to="nikoli@nikoli.msk.ru" id="122">
<body>&#xdf32;&#xdf3f;&#xdf44;&#xdf39;&#xdf43;&#xdf3a;</body>
<x xmlns="jabber:x:event">
<offline/>
<composing/>
<delivered/>
<displayed/>
</x>
<active xmlns="http://jabber.org/protocol/chatstates"/>
</message>

<stream:error>
<xml-not-well-formed xmlns="urn:ietf:params:xml:ns:xmpp-streams"/>
</stream:error>
Comment 5 Pali Rohár 2016-08-14 09:37:37 UTC
Confirmed, unicode characters in XML should be encoded as full codepoints, not as UTF-16 surrogate pairs. Surrogate pairs are invalid in XML, so server should really disconnect you.

See: http://www.w3.org/TR/REC-xml/#charsets

Character Range

[2]   	Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]	/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
Comment 6 Pali Rohár 2016-08-14 22:26:13 UTC
*** Bug 314272 has been marked as a duplicate of this bug. ***
Comment 7 Pali Rohár 2016-08-14 22:27:31 UTC
This is bug in QtXml... My workaround for libiris: https://github.com/psi-im/iris/pull/44
Comment 8 Pali Rohár 2016-08-15 22:18:27 UTC
Git commit f23d6ccc7a7f542059c3956d64d912a34584723e by Pali Rohár.
Committed on 15/08/2016 at 15:58.
Pushed by pali into branch 'Applications/16.08'.

jabber: Workaround bug in QtXML: Fix xmlToString when QDomElement contains Unicode characters above 0xFFFF

Upstream:
https://github.com/psi-im/iris/commit/8612bc340421087cf0ebfd426661ff22f7351270

See also discussion:
https://github.com/psi-im/iris/pull/44
https://github.com/psi-im/iris/pull/43
https://github.com/psi-im/iris/issues/42
https://github.com/psi-im/iris/issues/13
https://bugreports.qt.io/browse/QTBUG-25291
Related: bug 314272
FIXED-IN: 16.08.1

A  +19   -0    protocols/jabber/libiris/patches/01_qtxml_unicode.patch
M  +8    -0    protocols/jabber/libiris/src/xmpp/xmpp-core/xmlprotocol.cpp

http://commits.kde.org/kopete/f23d6ccc7a7f542059c3956d64d912a34584723e
Comment 9 Pali Rohár 2016-08-23 21:46:20 UTC
*** Bug 314272 has been marked as a duplicate of this bug. ***