Version: (using KDE KDE 3.0.99) Installed from: Compiled From Sources Compiler: GCC 3.2.1 OS: Linux Confirming XML and XHTML user-agents must stop rendering malformed documents. Konqueror does. Whilst most (all?) browsers that handle xhtml treat it as tag soup when served as text/html, I don't know of any other XHTML user-agents that do this for XHTML documents served as application/xhtml+xml. Mozilla and Opera certainly don't. This bug may also apply to other xml mime-types, such as text/xml, I haven't checked if Konqueror breaks in these circumstances as well. References: "Violations of well-formedness constraints are fatal errors" - http://www.w3.org/TR/REC-xml#dt-wfc "Once a fatal error is detected, however, the processor must not continue normal processing" - http://www.w3.org/TR/REC-xml#dt-fatal Also, I think this should also apply when opening files with an extension of .xhtml.
Created attachment 706 [details] Malformed XHTML document Serve this as application/xhtml+xml (the proper XHTML mime-type) to Mozilla and Opera, and they will both refuse to render it properly, flagging the error. Konqueror doesn't follow the XML specification and renders the document.
Sorry, just noticed a typo in the first line. "Konqueror does" should read "Konqueror does not".
still buggy in HEAD (~kde 3.2a2)
Just one question: does this affect any real life web pages that rely on invalid content not displayed?
Stephan, the point about XHTML is XML well-formedness in the first place. Are you really talking of invalid or of malformed? Afaik / imho, as long as delivered as xml based content type, e.g. application/xhtml+xml: * XHTML user agents must not display malformed documents XML based programs are expected not to continue processing a document once they found out it is not wellformed. * XHTML user agents must do all they can to display wellformed but invalid documents Just because the user agent thinks it's invalid doesn't mean it really is because there are several ways to validate a document: - XML validation against a DTD - XML Schema validation against a Schema - Relax NG validation against a Relax NG Schema A document could be e.g. Schema valid but not have a document type declaration (and thus be XML invalid). It's still required to display the document then. The best way is to use an internal element / attribute processing library triggered by the namespace.
you maybe misunderstood: I'm not questioning your bug report. I'm just wondering if the cause you reported it is academical or because you've got a problem out of this. Because we would like to have the bugs fixed first that create a problem in people's daily browsing.
Stephan ;-) it's more academical than affecting people in daily browsing :-) (Afair it's not my bug report at all) But still, there's a problem. Try http://www.hujer.com/start.de.xhtml and look at the Umlauts. (Normal users won't see this because current versions of Konqueror in circulation don't prefer application/xhtml+xml over text/html) The charset has been correctly declared thrice: - The HTTP Header is sent correctly: Content-Type: application/xhtml+xml;charset=utf-8 - The XML Declaration correctly declares utf-8 - The Meta element correctly declares application/xhtml+xml;charset=utf-8 (- Even my locale is set to UTF-8: de_DE.UTF-8) Still, Konqueror (at least my version, Konqueror 3.1.1 for KDE 3.1.1) fails to detect UTF-8 and instead renders iso-8859-1 (ä instead of ä, Atilde currency instead of aumlaut). I maybe wrong but I think this problem is closely related to Konqueror's handling of XHTML in general. Afair I reported that already separately.
Stephen, I'm the reporter of the bug. I agree that at the moment, it's pretty much academic due to the low use of application/xhtml+xml and the fact that Konqueror doesn't send that content-type in its accept header. However, it seems to be a very simple fix (just process application/xhtml+xml like any other unknown content-type, which is what Konqueror originally did) and it would be good to get this change in before a major release. The relevent code is literally just: if ( mimeType == "text/html" || mimeType == "text/xml" || mimeType == "application/xhtml+xml" ) So at a quick glance it's just a case of changing that line and removing the mime-type .desktop stuff so that Konqueror doesn't try and handle these types of files. Konqueror is also the only browser that I know of that gets this wrong. If a large enough market share of browsers gets this wrong, it will leave the door open for clueless authors/authoring tools to write malformed documents that "work" on Konqueror and similarly broken browsers, but exclude people using browsers that get it right, like Mozilla and Opera. Christian, IIRC, XHTML served as application/xhtml+xml has different character encoding rules than text/html. Looking at your site though, Konqueror might be getting confused by the 'qs' parameter you are supplying with the content-type - rfc 3236 (the definition of application/xhtml+xml) does not define this parameter.
Jim, afaik Konqueror must not get confused by the qs parameter. RFC 3236 doesn't define this parameter. RFC 2616 doesn't define this parameter as well. (It only defines q and only for Accept*:) Still, both rely on RFC 2046, which clearly says: "MIME implementations must also ignore any parameters whose names they do not recognize." (RFC 2046, Introduction, third paragraph, last sentence) Of course it would be better if Apache wouldn't include the qs parameter which is only specified for content negotiation, but this is out of may hands. Yet I think because of RFC 2046 I can expect a user agent to not get confused by a qs parameter.
Hi Stephen, I work at HP's LaserJet division on our embedded web server's content. I can tell you that Konq's "odd" xhtml support was one of the reasons (although certainly not the only reason) why we're still sending our valid xhtml 1.0 content as tag soup text/html. There were two major problems. First, Konq doesn't tell me it supports application/xhtml+xml and the second is that scripts inside CDATA sections don't work (bug 61101). Oddly enough, I believe safari does stop processing application/xhtml+xml documents when it finds invalid markup. If I had a vote, I would vote that Konq follow Opera's lead in this area. When it finds markup in application/xhtml+xml that is not valid, it goes back to tag soup mode and does its best to render the page. However, it also puts a very visible marker on the page letting people know that the page has bad markup. I thought that was a nice touch. My ultimate goal for our content is to send application/xhtml+xml to browsers that support it and text/html to those that don't. The biggest benifit I see to this is that our developers will know as soon as they render a page that they did something wrong rather than later when/if they remember to validate their pages with a separate validator. As it is now, there isn't any indication and most mistakes are not caught until our testing team runs the xhtml validation test suite - a process which happens well after most development is finished. So, in a nutshell, this bug was real-life for us... David
P.S.: RFC 2045 as well says "MIME implementations must ignore any parameters whose names they do not recognize." (5. Content-Type Header Field, fifth paragraph, last sentence)
David, imho when application/xhtml+xml is invalid, a user agent mustn't fall back to tag soup because the document still must be wellformed. Unknown elements or attributes simply should be ignored. I can expect a usual XHTML user agent to render invalid xhtml documents as long as they correctly declare the xhtml namespace and are wellformed. I also can expect a usual XHTML user agent to complain about malformedness whenever the document was sent with an xml based mime-type like text/xml, application/xml, application/xhtml+xml, image/svg+xml etc.. That's very different from tag soup.
OK, I see your point. I wasn't really thinking of unknown elements. I was thinking rather of inline elements that are not wrapped in a block element for example. That's actually a pretty common error in xhtml. For example <p> stuff </p> <input type="button" value="foo" /> </p> more stuff </p> In situations like this, I like Opera's behavior. It basically renders the page with the button where the programmer expected it to be but then puts a big warning message up. I don't know if Opera still treats the markup at that point as xhtml or as tag soup, but the behavior is nice.
David: The example can be valid. - it might be content of <div/> - it might be content of <body/> or <form/> and the document type is XHTML 1.0 Transitional, where the content model of <body/> and <form/> is %Flow; - (most important) the default content model of <body/> and <form/> might be overridden in a custom XHTML subtype to allow %Flow; instead of %Block; It's only invalid in XHTML 1.0 Strict, XHTML 1.1 and XHTML Basic. Because the content model of elements may change in XHTML subtypes and even new elements can be added in XHTML subtypes, I'd expect a user agent to interpret elements by name in namespace, with the structure being coupled not too tight and with interpreting unknown elements similar <span/>, applying a stylesheet if available. Christian
Sorry, I am not sure to understand the argument. Konqueror is still a HTML browser (see for example bug #42683) So it does *not* support application/html+xml (It might work with it but it does not support it.) Therefore Konqueror can also not reject bad XHTML document, as it does not work on XHTML. Also I fail to see why browsers should reject things. Most HTML user agents should try to display/process whatever is given to them, as they are not validators. Rejecting a document will not be understood by an average user. This is a usability problem. (It is similar to that time when a Group 3 Fax on an analogue phone line could not contact a ISDN Fax because it was announced by ISDN as "analogue voice" and not as "Group 3 fax". Experts understood why, normal users did not.) Of course, well-formed documents are much much better. (I too wonder sometimes what HTML codes are floating around the Internet.) Have a nice day!
> Konqueror is still a HTML browser Correct. > So it does *not* support application/html+xml A browser can be both an HTML user-agent and an XHTML user-agent. Mozilla and Opera are examples. > Therefore Konqueror can also not reject bad XHTML document, as it does not work on XHTML. If I understand you correctly, you are saying that Konqueror should not reject bad XHTML documents because Konqueror is not an XHTML user-agent. It is true that Konqueror is not a conforming XHTML user-agent. That is why it should not attempt to render documents that are served as application/xhtml+xml - it should give the option of saving or opening in another application (e.g. Mozilla). The well-formed requirement is a fundamental part of XML, and until Konqueror can get the basics right, it shouldn't step in and screw up when the possibility of opening the document in a real XHTML user-agent is there. > Also I fail to see why browsers should reject things. Look at the utter crud text/html has become. When HTML was first designed, Postel's Law [1] must have seemed like a sensible approach. However, web authors didn't hold up their end of the bargain, and the web has suffered as a result. I guess XML's requirement to stop processing malformed documents is a reaction to that - and the more browsers that "forgive" XML errors in the way that Konqueror does, the more likely it is that web authors will write crud. > Rejecting a document will not be understood by an average user. Average users, at the moment, are not reading application/xhtml+xml documents. Average users will not do so until the majority of browsers support it. Given the current user-base of Internet Explorer, it'll probably be close to a decade before support for text/html (which Konqueror supports just fine) wanes. [1] http://essaysfromexodus.scripting.com/postelsLaw
The issue is not just academical, it is an inconvenience for those writing web pages that are served as application/xhtml+xml (personal experience). If you quickly write a web page (which may contain minor errors) and check it in konqueror, you hope that it (despite the minor errors) will essentially look the same as in other browsers/mozilla. This is not the case here. Note that IE does not accept invalid application/xhtml+xml either (of course, it does not accept valid either :-)
>Also I fail to see why browsers should reject things. Most HTML user agents should try to display/process whatever is given to them, as they are not validators. >Rejecting a document will not be understood by an average user. This is a usability problem. I disagree. Almost all applications that display information/documents (image viewers, pdf files, word processing documents) dies with an error message ("corrupt file") if you try to open an invalid file with it.
It is utterly essential to the successful evolution of the web that no attempts are made to interpret malformed XML documents. Mis-interpretation and arbitrary interpretation of standards by vendors is the number one reason that we're in the mess we're in, and I would hope over-worked konqueror developers would appreciate that. Should not the doctype take priority over the content-type field when determining how to present an HTML or XML document?
> Should not the doctype take priority over the content-type field when determining how to present an HTML or XML document? Absolutely not, the Content-Type header should always take priority. "If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource." -- Section 7.2.1 of RFC 2616 (HTTP 1.1) Now as it happens, some media types actually cover a number of different file formats. For instance, text/xml. If it's one of those types, you can make a case for sniffing doctypes. In the special case of text/html and Appendix C. XHTML, I know of no web browser that treats it as XHTML, so it may be worth following everybody else's lead in that case.
An exemple when HTML parsing doesn't work, but XHTML as XML do: ... <head> <title>title</title> <script type="text/javascript" src="alert.js" /> </head> ... With .xhtml: http://yansanmo.no-ip.org:8080/test/xhtml/xhtml_script_seul.xhtml (application/xhtml+xml) [doesn't work right now] With .xml: http://yansanmo.no-ip.org:8080/test/xhtml/xhtml_script_seul.xml (text/xml) [run the script and display the alert]
Here is my opinion in the matter: of course should khtml abort the processing of non-well-formed XML documents, /invalid/ XHTML documents are another matter. I neither see this as an academic matter since it's one of the foundations of XML -- see #19 from Jon Dowland for a good explanation. Most web-technology development happens in the XML area; SVG, XSLT, etc, and rejecting non well-formed documents is necessary to create substance and order to build upon. Cheers, Frans
Created attachment 11231 [details] Implied elements testcase Neither Firefox 1.04 nor Opera 8.0 see a <tbody> element to apply the style to. Konqueror sees a <tbody> element because it parses XHTML as HTML.
Created attachment 11232 [details] Style content model testcase Neither Firefox 1.04 nor Opera 8.0 see any stylesheet rules because they are commented out. Konqueror sees the stylesheet rules because it improperly treats <style> and <script> elements as if they had #PCDATA content, as the HTML 4.01 specification states, rather than #CDATA content as the XHTML 1.0 specification states.
Created attachment 11235 [details] Case sensitivity testcase Neither Firefox 1.04 nor Opera 8.0 apply a P#test CSS rules to p#test elements. Konqueror applies P#test CSS rules to p#test elements as it follows HTML rules which are case-insensitive, as opposed to XHTML rules which are case-sensitive.
Created attachment 11236 [details] Canvas background testcase CSS 2.1 treats the root element background as a special case in HTML, but explicitly states that this special case doesn't apply to XHTML. http://www.w3.org/TR/CSS21/colors.html#q2 Firefox 1.04 does not apply the testcase, but Konqueror 3.4.0 and Opera 8.0 do. Opera have acknowledged this to be a bug in Opera, so presumably they will be fixing it at some point. http://lists.w3.org/Archives/Public/www-style/2005Feb/0039 (BTW: these testcases are all failing in Konqueror 3.4.0).
The first "style on tbody" fails and the last of "style on root" fails. The rest works in SVN HEAD.
Created attachment 11237 [details] XML language testcase Konqueror 3.4.0 doesn't understand it when documents specify their language(s) using xml:lang.
Created attachment 11239 [details] Dom nodeName testcase Element type names accessed through the DOM for HTML documents need to be given in uppercase. Element type names accessed through the DOM for XHTML documents need to have their case preserved. Konqueror 3.4.0 returns element type names in uppercase, applying the HTML rule, rather than preserving case as per XHTML rules. Opera 8.0 and Firefox 1.04 pass the testcase.
Created attachment 11240 [details] Supporting external stylesheet for xml-stylesheet.xhtml Konqueror doesn't apply stylesheets linked to documents with <?xml-stylesheet>. Opera 8.0 and Firefox 1.04 do.
Created attachment 11241 [details] xml-stylesheet testcase Konqueror doesn't apply stylesheets linked to documents with <?xml-stylesheet>. Opera 8.0 and Firefox 1.04 do.
Created attachment 11337 [details] xml-stylesheet testcase Fixed to reference the online stylesheet properly.
SVN commit 422527 by carewolf: Parse XHTML as XML CCBUG: 52665 M +1 -5 khtml_part.cpp --- trunk/KDE/kdelibs/khtml/khtml_part.cpp #422526:422527 @@ -1879,15 +1879,11 @@ m_url = url; - bool servedAsXHTML = args.serviceType == "application/xhtml+xml"; bool servedAsXML = KMimeType::mimeType(args.serviceType)->is( "text/xml" ); - // ### not sure if XHTML documents served as text/xml should use DocumentImpl or HTMLDocumentImpl - if ( servedAsXML && !servedAsXHTML ) { // any XML derivative, except XHTML + if ( servedAsXML ) { // any XML derivative, including XHTML d->m_doc = DOMImplementationImpl::instance()->createDocument( d->m_view ); } else { d->m_doc = DOMImplementationImpl::instance()->createHTMLDocument( d->m_view ); - // HTML or XHTML? (#86446) - static_cast<HTMLDocumentImpl *>(d->m_doc)->setHTMLRequested( !servedAsXHTML ); } #ifndef KHTML_NO_CARET // d->m_view->initCaret();
On Sunday 05 June 2005 20:46, Allan Sandfeld Jensen wrote: > SVN commit 422527 by carewolf: > > Parse XHTML as XML Hmm, are you sure about this change? It completely breaks khtmltests/regression/tests/dom/namespaces.html : Passes: 93 Failures: 32 Errors: 3 (I double-checked, and it passed completely before this change) > - // HTML or XHTML? (#86446) (... why ignore the testcases in the mentionned bug report, when removing the fix for it?...)
Konqueror 3.5 beta 2 is now failing all the testcases, this is a regression from previous versions.
*** Bug 121555 has been marked as a duplicate of this bug. ***
*** Bug 121552 has been marked as a duplicate of this bug. ***
*** Bug 121550 has been marked as a duplicate of this bug. ***
*** Bug 121554 has been marked as a duplicate of this bug. ***
I would think this would be a better starting point: --- /home/maksim/kde3/kdelibs/khtml/html/html_documentimpl.cpp (revision 510753) +++ /home/maksim/kde3/kdelibs/khtml/html/html_documentimpl.cpp (working copy) @@ -197,7 +197,10 @@ void HTMLDocumentImpl::setBody(HTMLEleme Tokenizer *HTMLDocumentImpl::createTokenizer() { + if (m_htmlRequested) return new HTMLTokenizer(docPtr(),m_view); + else + return DocumentImpl::createTokenizer(); } // -------------------------------------------------------------------------- It reveals a couple problems, however. For starters, the whitespace handling using the XML parser looks wrong. Second, we don't seem to run scripts in the right place in there.
*** Bug 125245 has been marked as a duplicate of this bug. ***
SVN commit 529770 by carewolf: Fix XHTML parsing by ignoring white-space violating DTD CCBUG: 52665 M +8 -3 xml_tokenizer.cpp --- branches/KDE/3.5/kdelibs/khtml/xml/xml_tokenizer.cpp #529769:529770 @@ -163,7 +163,7 @@ return false; } - if (newElement->id() == ID_SCRIPT) + if (newElement->id() == ID_SCRIPT || newElement->id() == makeId(xhtmlNamespace, ID_SCRIPT)) static_cast<HTMLScriptElementImpl *>(newElement)->setCreatedByParser(true); //this is tricky. in general the node doesn't have to attach to the one it's in. as far @@ -247,8 +247,12 @@ return false; return true; } - else + else { + // Don't worry about white-space violating DTD + if (ch.stripWhiteSpace().isEmpty()) return true; + return false; + } } @@ -276,6 +280,7 @@ QString XMLHandler::errorString() { + // ### Make better error-messages return i18n("the document is not in the correct file format"); } @@ -497,7 +502,7 @@ // Recursively go through the entire document tree, looking for html <script> tags. For each of these // that is found, add it to the m_scripts list from which they will be executed - if (n->id() == ID_SCRIPT) { + if (n->id() == ID_SCRIPT || n->id() == makeId(xhtmlNamespace, ID_SCRIPT)) { m_scripts.append(static_cast<HTMLScriptElementImpl*>(n)); }
AFAIK the idea of XHTML 1.1 (which SHOULD be application/xhtml+xml) is to serve only nice, XML validating pages. The purpose it to avoid situations when browser has to guess how docuemnt's DOM should be interpretted. Konqueror rendering not valid XML documents breaks the whole idea of XHTML 1.1. It is amazing that browser which is so advanced about standards and CSS treats invalid XML doc in so horrbile a way! Could someone fix at least this awful bug?
*** Bug 145883 has been marked as a duplicate of this bug. ***
Konqueror does halt on XML errors the problem is that is don't pass application/xhtml+xml as xml. It seems you need to use application/xml till this is fixed.
Hello, Could someone add " application/xhtml+xml support " in the summary of this bug report please? thanks, Gérard
> Just one question: does this affect any real life web pages that rely on invalid content not displayed? The MAMA (Metadata Analysis and Mining Application) study conducted in january 2008 [1] reports that out of 3,509,180 URLs that MAMA analyzed, only 935 used the application/xhtml+xml MIME type which is 0.03%. http://dev.opera.com/articles/view/mama-http-headers/#conttype http://devfiles.myopera.com/articles/554/mamaurlset-mimehistogram.htm [1]: Analysis phase Dates Main analysis 31 Oct. - 13 Nov. 2007; 10 - 12 Dec. 2007; 28 - 29 Jan. 2008 Markup validation 08 - 29 Jan. 2008 http://dev.opera.com/articles/view/mama-methodology/#analysisprocessing The fact that IE9 is going to support application/xhtml+xml MIME type may change such low percentage but not by a wide margin. regards, Gérard Talbot
Does this still apply to Konqueror 4.8.4 or later?
When I load attachment 706 [details], I get actual results and not expected results. I am using KDE Platform Version: 4.8.4 Konqueror version: 4.8.4 (with KHTML rendering engine) Qt Version: 4.8.1 Operating System: Linux 3.2.0-25-generic-pae i686 (32bits) Distribution: Kubuntu 12.04 LTS here. Summary, Version and Keywords fields have been updated. I believe Component should not be "khtml xml" but rather could be "khtml parsing" ... although I am not sure of this. ---------- Most of the CSS 2.1 test suite tests served as application/xhtml+xml mimeType fail to apply correctly the CSS code (but this could be another bug actually). CSS 2.1 test suite, RC6, 20110323 http://test.csswg.org/suites/css2.1/20110323/ E.g http://test.csswg.org/suites/css2.1/20110323/html4/c5502-mrgn-r-003.htm versus http://test.csswg.org/suites/css2.1/20110323/xhtml1/c5502-mrgn-r-003.xht Gérard
Dear user, KHTML (and KJS) was a long time more or less unmaintained and got removed in KF6. Please migrate to use a QWebEngine based HTML component. We will do no further fixes or improvements to the KF5 branches of these components beside important security fixes. For security issues, please see: https://kde.org/info/security/ Sorry that we did not fix this issue during the life-time of KHTML. Greetings Christoph Cullmann