(*** This bug was imported into bugs.kde.org ***) Package: khtml Version: 3.0.1 (CVS >= 20020327) (using KDE 3.0.0 ) Severity: normal Installed from: Compiled From Sources Compiler: Not Specified OS: Linux OS/Compiler notes: Not Specified KHTML doesn't seem to acknowledge the encoding-attribute of the <?xml?> tag. The following document (also available at http://www.stud.ifi.uio.no/~mortehu/utf-8.html) demonstrates this. (The following is encoded in ISO-8859-1) <?xml version="1.0" encoding="utf-8"?> <html> <head> <title>UTF-8 test</title> </head> <body> ø should appear ø </body> </html> (Submitted via bugs.kde.org)
*** Bug 43428 has been marked as a duplicate of this bug. ***
*** Bug 19870 has been marked as a duplicate of this bug. ***
The page http://www.stud.ifi.uio.no/~mortehu/utf-8.html has gone. Can somebody retrieve it and add it as an attachement for this bug?
Sorry, I just re-uploaded it.
This bug is still present in 3.0.5, or at least in the Red Hat package kdebase-3.0.5a, specifically kdebase-3.0.5a-0.73.2:6.i386.rpm. As described, if the encoding is set to "UTF-8", Konqueror displays characters outside the "ASCII range" as either ISO-8859-1 or ISO-8859-15 (probably using my locale settings). Mozilla-based browsers (such as Mozilla 1.3.x and 1.4) don't exhibit this behaviour. (Seems that this bug has reached its voting limit or I'd vote for it as well.)
KDE 3.0 isn't developed anymore. But this bug is still present on KDE CVS HEAD.
Here's a practical example where this bug is very obvious in Konqueror: http://www.catb.org/~esr/jargon/ Here are some specific pages where this bug appears: http://www.catb.org/~esr/jargon/html/speech-style.html http://www.catb.org/~esr/jargon/html/inarticulations.html http://www.catb.org/~esr/jargon/html/p-convention.html (very annoying)
gcc version 3.2.3 20030422 (Gentoo Linux 1.4 3.2.3-r3, propolice) Some xml pages can be viewed correctly, I've noticed by switching manually to UTF-8 in Konq (a quick fix until this bug is fixed). I had been trying to view roll call votes from the US Congres (http://clerk.house.gov/evs/2003/index.asp) and that lead me here. It uses a XSL page that is in UTF-8 so switching manually in Konqueror won't work, apparently. Glad to see someone has narrowed it down, at least. These look like possible duplicates: Bug 42683, Bug 77933.
The clearest and persistent example of XML parsing is http://market.yandex.ru/ site. While it is showing normally in Mozilla, Konqueror reports "unexpected end of data. Some information may be lost" (or smth like that, my translation from Russian). Konqueror 3.2.x displayed the page regardless of that warning but failed to process search form on the page. Konqueror 3.3.0 even fails to display the page unless encoding is set manually to cp1251.
CVS commit by carewolf: Merge encoding detection improvements from WebCore BUG: 42683 M +7 -0 ChangeLog 1.351 M +16 -8 khtml_part.cpp 1.1058 M +1 -1 ecma/xmlhttprequest.cpp 1.10 M +179 -34 misc/decoder.cpp 1.74 M +11 -2 misc/decoder.h 1.21
*** Bug 70877 has been marked as a duplicate of this bug. ***