Version: (using KDE KDE 3.4.3)
Installed from: Ubuntu Packages
KHTML's XHR does not seem to use UTF-8 decoding, by default, as the other browsers do (and as specified by the W3C working draft here: http://www.w3.org/TR/2006/WD-XMLHttpRequest-20060405/
This can be seen in the test case here:
where Dojo uses XHR to retrieve a resource which is encoded in UTF-8 (where the server specifies no encoding) The other major browsers assume UTF-8 encoding in this case.
the files loaded via XHR can be found under
Created attachment 18068 [details]
I confirm the same behaviour. Unfortunately, for speakers of languages with
non-latin alphabets, this can be a bit of a problem, since AJAX replies are now
by default rendered using iso8859-1, ending up completely jammed on screen.
UTF-8 or UTF-16 marked by BOM are in general the standard encodings for XML
documents, when no explicit encoding specification is present. Apart from that,
the W3C specification of XML 1.0 ( http://www.w3.org/TR/xml/#charencoding)
mandates that UTF-16 encoded XML documents be always marked with a BOM, whereas
UTF-8 may optionally have a BOM. IMHO it should default to UTF-8, since this
is the expected behaviour by most web applications. Since
khtml::Decoder::decode always looks for a BOM at the beginning of the stream,
setting the default encoding of XMLHttpRequest replies to UTF-8 guarantees that
it will always work with UTF-8, UTF-8 w/ BOM and UTF-16 w/ BOM.
I'm not familiar with the internals of KDE, but the following patch fixes the
issue for me. Still i'm not sure about the use of the Decoder::DefaultEncoding
constant or whether something else should be used instead.
*** This bug has been confirmed by popular vote. ***
Also, please note that content other than XML may be passed over XHR. In Dojo's case, we pass JS which we eval, so putting a BOM at the top is not an option. We did something far uglier for a workaround...
"it seems like would be able to get away with: /* <?xml version="1.0" encoding="UTF-8" ?> */ in the top of your translation files"
Which appears to work as a side effect of the parser sniffing for encoding headers.
Can the patch get reviewed and approved for 3.5.7?
I've tried Kubuntu 7.04 right now and it seems this bug is fixed in it, while in my ArchLinux - not.
r718830 | adawit | 2007-09-29 16:20:38 -0400 (Sat, 29 Sep 2007) | 5 lines
* Default to "UTF-8" per section 2 of the draft W3C "The XMLHttpRequest Object" specification. Fixes BR# 130234
--- xmlhttprequest.cpp (revision 657077)
+++ xmlhttprequest.cpp (revision 718830)
@@ -674,7 +674,8 @@
- // FIXME: Inherit the default encoding from the parent document?
+ // Per section 2 of W3C working draft spec, fall back to "UTF-8".
+ decoder->setEncoding("UTF-8", Decoder::DefaultEncoding);
if (len == 0)