130234 – UTF-8 encoding not used for XMLHttpRequest

Bug 130234 - UTF-8 encoding not used for XMLHttpRequest

Summary: UTF-8 encoding not used for XMLHttpRequest

Status:	RESOLVED FIXED

Alias:	None

Product:	konqueror
Classification:	Applications
Component:	khtml (other bugs)
Version First Reported In:	unspecified
Platform:	Ubuntu Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Konqueror Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-07-04 05:32 UTC by Adam Peller
Modified:	2007-09-29 22:47 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Proposed patch (520 bytes, patch) 2006-10-09 16:15 UTC, Apollon Oikonomopoulos	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Adam Peller 2006-07-04 05:32:08 UTC

Version:            (using KDE KDE 3.4.3)
Installed from:    Ubuntu Packages
OS:                Linux

KHTML's XHR does not seem to use UTF-8 decoding, by default, as the other browsers do (and as specified by the W3C working draft here: http://www.w3.org/TR/2006/WD-XMLHttpRequest-20060405/

This can be seen in the test case here:
http://archive.dojotoolkit.org/nightly/tests/i18n/test_strings.html

most of the non-ASCII examples here show the strings decoded using the wrong encoding, e.g. zh-cn and zh-tw at the bottom of the page.  (Some other examples, like Korean (ko) were sent as Javascript \uxxxx escape codes and therefore render just fine)

where Dojo uses XHR to retrieve a resource which is encoded in UTF-8 (where the server specifies no encoding)  The other major browsers assume UTF-8 encoding in this case.

the files loaded via XHR can be found under

http://archive.dojotoolkit.org/nightly/tests/i18n/nls/*/salutations.js

Comment 1 Apollon Oikonomopoulos 2006-10-09 16:15:32 UTC

Created attachment 18068 [details]
Proposed patch

I confirm the same behaviour. Unfortunately, for speakers of languages with
non-latin alphabets, this can be a bit of a problem, since AJAX replies are now
by default rendered using iso8859-1, ending up completely jammed on screen.

UTF-8 or UTF-16 marked by BOM are in general the standard encodings for XML
documents, when no explicit encoding specification is present. Apart from that,
the W3C specification of XML 1.0 ( http://www.w3.org/TR/xml/#charencoding)
mandates that UTF-16 encoded XML documents be always marked with a BOM, whereas
UTF-8 may optionally have a BOM.  IMHO it should default to UTF-8, since this
is the expected behaviour by most web applications. Since
khtml::Decoder::decode always looks for a BOM at the beginning of the stream,
setting the default encoding of XMLHttpRequest replies to UTF-8 guarantees that
it will always work with UTF-8, UTF-8 w/ BOM and UTF-16 w/ BOM.

I'm not familiar with the internals of KDE, but the following patch fixes the
issue for me. Still i'm not sure about the use of the Decoder::DefaultEncoding
constant or whether something else should be used instead.

Cheers,
Apollon

Comment 2 George T 2006-10-10 17:16:41 UTC

*** This bug has been confirmed by popular vote. ***

Comment 3 Adam Peller 2006-10-10 19:34:38 UTC

Also, please note that content other than XML may be passed over XHR.  In Dojo's case, we pass JS which we eval, so putting a BOM at the top is not an option.  We did something far uglier for a workaround...

"it seems like would be able to get away with: /* <?xml version="1.0" encoding="UTF-8" ?> */ in the top of your translation files"

Which appears to work as a side effect of the parser sniffing for encoding headers.

Comment 4 Daniel Hahler 2007-03-22 22:51:23 UTC

Can the patch get reviewed and approved for 3.5.7?

Comment 5 Igor 2007-09-18 19:31:07 UTC

I've tried Kubuntu 7.04 right now and it seems this bug is fixed in it, while in my ArchLinux - not.

Comment 6 Dawit Alemayehu 2007-09-29 22:47:52 UTC

r718830 | adawit | 2007-09-29 16:20:38 -0400 (Sat, 29 Sep 2007) | 5 lines

* Default to "UTF-8" per section 2 of the draft W3C "The XMLHttpRequest Object" specification. Fixes BR# 130234

BUG:130234

Index: xmlhttprequest.cpp
===================================================================
--- xmlhttprequest.cpp  (revision 657077)
+++ xmlhttprequest.cpp  (revision 718830)
@@ -674,7 +674,8 @@
     if (!encoding.isNull())
       decoder->setEncoding(encoding.latin1(), Decoder::EncodingFromHTTPHeader);
     else {
-      // FIXME: Inherit the default encoding from the parent document?
+      // Per section 2 of W3C working draft spec, fall back to "UTF-8".
+      decoder->setEncoding("UTF-8", Decoder::DefaultEncoding);
     }
   }
   if (len == 0)