Summary: | failure to detect XML document encoding | ||
---|---|---|---|
Product: | [Applications] konqueror | Reporter: | mortehu |
Component: | khtml xml | Assignee: | Konqueror Developers <konq-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | code, kde_bugs, max.moritz.sievers, neil, nicolasg, oneugene |
Priority: | NOR | ||
Version: | 3.0.1 | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
mortehu
2002-05-16 11:58:42 UTC
*** Bug 43428 has been marked as a duplicate of this bug. *** *** Bug 19870 has been marked as a duplicate of this bug. *** The page http://www.stud.ifi.uio.no/~mortehu/utf-8.html has gone. Can somebody retrieve it and add it as an attachement for this bug? Sorry, I just re-uploaded it. This bug is still present in 3.0.5, or at least in the Red Hat package kdebase-3.0.5a, specifically kdebase-3.0.5a-0.73.2:6.i386.rpm. As described, if the encoding is set to "UTF-8", Konqueror displays characters outside the "ASCII range" as either ISO-8859-1 or ISO-8859-15 (probably using my locale settings). Mozilla-based browsers (such as Mozilla 1.3.x and 1.4) don't exhibit this behaviour. (Seems that this bug has reached its voting limit or I'd vote for it as well.) KDE 3.0 isn't developed anymore. But this bug is still present on KDE CVS HEAD. Here's a practical example where this bug is very obvious in Konqueror: http://www.catb.org/~esr/jargon/ Here are some specific pages where this bug appears: http://www.catb.org/~esr/jargon/html/speech-style.html http://www.catb.org/~esr/jargon/html/inarticulations.html http://www.catb.org/~esr/jargon/html/p-convention.html (very annoying) gcc version 3.2.3 20030422 (Gentoo Linux 1.4 3.2.3-r3, propolice) Some xml pages can be viewed correctly, I've noticed by switching manually to UTF-8 in Konq (a quick fix until this bug is fixed). I had been trying to view roll call votes from the US Congres (http://clerk.house.gov/evs/2003/index.asp) and that lead me here. It uses a XSL page that is in UTF-8 so switching manually in Konqueror won't work, apparently. Glad to see someone has narrowed it down, at least. These look like possible duplicates: Bug 42683, Bug 77933. The clearest and persistent example of XML parsing is http://market.yandex.ru/ site. While it is showing normally in Mozilla, Konqueror reports "unexpected end of data. Some information may be lost" (or smth like that, my translation from Russian). Konqueror 3.2.x displayed the page regardless of that warning but failed to process search form on the page. Konqueror 3.3.0 even fails to display the page unless encoding is set manually to cp1251. CVS commit by carewolf: Merge encoding detection improvements from WebCore BUG: 42683 M +7 -0 ChangeLog 1.351 M +16 -8 khtml_part.cpp 1.1058 M +1 -1 ecma/xmlhttprequest.cpp 1.10 M +179 -34 misc/decoder.cpp 1.74 M +11 -2 misc/decoder.h 1.21 *** Bug 70877 has been marked as a duplicate of this bug. *** |