Version: (using KDE KDE 3.4.3) Installed from: Ubuntu Packages OS: Linux KHTML's XHR does not seem to use UTF-8 decoding, by default, as the other browsers do (and as specified by the W3C working draft here: http://www.w3.org/TR/2006/WD-XMLHttpRequest-20060405/ This can be seen in the test case here: http://archive.dojotoolkit.org/nightly/tests/i18n/test_strings.html most of the non-ASCII examples here show the strings decoded using the wrong encoding, e.g. zh-cn and zh-tw at the bottom of the page. (Some other examples, like Korean (ko) were sent as Javascript \uxxxx escape codes and therefore render just fine) where Dojo uses XHR to retrieve a resource which is encoded in UTF-8 (where the server specifies no encoding) The other major browsers assume UTF-8 encoding in this case. the files loaded via XHR can be found under http://archive.dojotoolkit.org/nightly/tests/i18n/nls/*/salutations.js
Created attachment 18068 [details] Proposed patch I confirm the same behaviour. Unfortunately, for speakers of languages with non-latin alphabets, this can be a bit of a problem, since AJAX replies are now by default rendered using iso8859-1, ending up completely jammed on screen. UTF-8 or UTF-16 marked by BOM are in general the standard encodings for XML documents, when no explicit encoding specification is present. Apart from that, the W3C specification of XML 1.0 ( http://www.w3.org/TR/xml/#charencoding) mandates that UTF-16 encoded XML documents be always marked with a BOM, whereas UTF-8 may optionally have a BOM. IMHO it should default to UTF-8, since this is the expected behaviour by most web applications. Since khtml::Decoder::decode always looks for a BOM at the beginning of the stream, setting the default encoding of XMLHttpRequest replies to UTF-8 guarantees that it will always work with UTF-8, UTF-8 w/ BOM and UTF-16 w/ BOM. I'm not familiar with the internals of KDE, but the following patch fixes the issue for me. Still i'm not sure about the use of the Decoder::DefaultEncoding constant or whether something else should be used instead. Cheers, Apollon
*** This bug has been confirmed by popular vote. ***
Also, please note that content other than XML may be passed over XHR. In Dojo's case, we pass JS which we eval, so putting a BOM at the top is not an option. We did something far uglier for a workaround... "it seems like would be able to get away with: /* <?xml version="1.0" encoding="UTF-8" ?> */ in the top of your translation files" Which appears to work as a side effect of the parser sniffing for encoding headers.
Can the patch get reviewed and approved for 3.5.7?
I've tried Kubuntu 7.04 right now and it seems this bug is fixed in it, while in my ArchLinux - not.
r718830 | adawit | 2007-09-29 16:20:38 -0400 (Sat, 29 Sep 2007) | 5 lines * Default to "UTF-8" per section 2 of the draft W3C "The XMLHttpRequest Object" specification. Fixes BR# 130234 BUG:130234 Index: xmlhttprequest.cpp =================================================================== --- xmlhttprequest.cpp (revision 657077) +++ xmlhttprequest.cpp (revision 718830) @@ -674,7 +674,8 @@ if (!encoding.isNull()) decoder->setEncoding(encoding.latin1(), Decoder::EncodingFromHTTPHeader); else { - // FIXME: Inherit the default encoding from the parent document? + // Per section 2 of W3C working draft spec, fall back to "UTF-8". + decoder->setEncoding("UTF-8", Decoder::DefaultEncoding); } } if (len == 0)