Version: (using KDE KDE 3.2.3) Installed from: Gentoo Packages Compiler: gcc (GCC) 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6) OS: Linux document.getElementsByTagName should give me all the elements with the suplied tag name regardless of whether the elements are visible or not. The supplied example gives an alert message saying "found 3 images" in mozilla and IE, but gives "found 0 images" in konqueror.
Created attachment 6969 [details] Test case showing problem Test case showing the problem. If opened with mozilla or IE, it finds 3 images, if opened with konqueror, it finds 0 images.
Hi, The problem is not the invisible images, but the fact xhtml works with lowercase tags and you search with uppercase. I suppose the other browsers have some compatibility mode, so I'll attach a patch that also makes konq/html a bit more fogiving in this case. Cheers, Rob.
Created attachment 7664 [details] Patch to fix getElementsByTagName problem with XHTML
Re #3: Sorry but I don't think the patch is acceptable ; it's a more complicated issue. The behaviour you observe with Mozilla is because they implemented special compatibility rules for XHTML served as media type "text/html", that are defined here: http://www.w3.org/TR/xhtml-media-types/ This is a tricky section, full of "should" or "may", but basically, what we need to implement to follow this is: - if the document is an XHTML document and is served as "text/html" - then the lookups for elements by non-ns aware DOM HTML methods should be case insensitive, and the elements should be returned uppercase.
Even some staff from W3C use the XHTML as "text/html" is HTML behavior. Like on this page: http://www.w3.org/DOM/Test/ The behaviour is also defined in http://www.w3.org/TR/xhtml1/#C_11 "user agents that access XHTML documents served as Internet media type text/html via the DOM can use the HTML DOM, and can rely upon element and attribute names being returned in upper-case from those interfaces."
From a quick look at the code, this would need an additional param in createHTMLDocument() (coming from args.serviceType, to check whether it's text/html or application/xhtml+xml), passed to HTMLDocumentImpl's constructor, and stored there, and then using it in the line e->setHTMLCompat( htmlMode() != XHtml ) (html/html_documentimpl.cpp:211) or in HTMLDocumentImpl::determineParseMode() itself (depending on whether this should affect the parsing, or only the html-compat mode).
Created attachment 7852 [details] suggested patch
Hi David! the patch looks excellent... I wonder though about the alternative you describe in #6, if it should indeed affect parsing... because if it does, there's not much point in having an xhtml doctype at all? Does Mozilla do that too?
Good point. Do you know how I could write a test that checks whether the parsing is done with XHTML or with HTML4-Compat mode? [We tried <script/> but that was a bad idea - htmltokenizer doesn't support it, it always looks for </script>.] BTW with xhtml doctype and .xhtml extension, Mozilla treats the file like XML - it shows the raw tree. This surprises me, I thought the idea behind xhtml was still to render it as HTML :)
In fact there are very few differences between parsing modes within the Html parser... it is always very forgiving. The CSS Parser does make more differences - notably case sensitivity. At the moment, Transitional and Compat behaves identically, so that's really just a difference between Strict and everything else ;( (but there are very significative differences in _rendering_ between Strict mode and others - i.e the quirk mode). For instance, given an XHtml document with doctype Strict, but served as "text/html", the compatibility guidelines would require the DOM interface to be case insensitive, but the stylesheets still would be case sensitive... > BTW with xhtml doctype and .xhtml extension, Mozilla treats the file like XML - it > shows the raw tree. This surprises me, I thought the idea behind xhtml was still to > render it as HTML :) they are always overdoing it :)
Created attachment 7872 [details] Strict doctype
Created attachment 7873 [details] transitional mode Mozilla follows strictly the guidelines, i.e only attribute and element names are case insensitive and returned uppercase.
Thanks for the testcases, I have merged them into my regression/tests testcases. The css parsing thing didn't test the actual patch though (they work before and after), to answer whether it's ok to change the parse mode. But I found a testcase that shows that it's in fact not ok to do so. When publicId=strict and systemId=transitional, the code says "for XHTML, trust publicId, so be strict". But with my patch it now choose transitional since hMode != XHtml, whereas Mozilla does choose strict indeed [this particular testcase will be committed as tests/parser/compatmode_xhtml_mixed.html soon]. I also couldn't change the e->setHTMLCompat() line, there are in fact many such lines, all activating htmlcompat from many places. So I set htmlMode to non-xhtml at the end of determineParseMode, when all the rest has been done (so this doesn't influence e.g. the CSS parsing mode).
CVS commit by faure: When the document is loaded as text/html, even if xhtml doctype, activate case-insensitive ("htmlCompat") lookup of tags and attributes (but not in CSS parser). (#86446) Regression tests: dom/namespaces.html dom/namespaces_xhtml_strict.html parser/compatmode* CCMAIL: 86446-done@bugs.kde.org M +6 -0 ChangeLog 1.297 M +2 -0 khtml_part.cpp 1.1030 M +5 -0 html/html_documentimpl.cpp 1.168 M +3 -1 html/html_documentimpl.h 1.76 --- kdelibs/khtml/html/html_documentimpl.cpp #1.167:1.168 @@ -71,4 +71,5 @@ HTMLDocumentImpl::HTMLDocumentImpl(DOMIm m_doAutoFill = false; + m_htmlRequested = false; /* dynamic history stuff to be fixed later (pfeiffer) @@ -382,4 +383,8 @@ void HTMLDocumentImpl::determineParseMod if ( hMode == XHtml ) pMode = publicId; + + // This needs to be done last, see tests/parser/compatmode_xhtml_mixed.html + if ( m_htmlRequested && hMode == XHtml ) + hMode = Html4; // make all tags uppercase when served as text/html (#86446) } // kdDebug() << "DocumentImpl::determineParseMode: publicId =" << publicId << " systemId = " << systemId << endl; --- kdelibs/khtml/html/html_documentimpl.h #1.75:1.76 @@ -74,4 +74,5 @@ public: void setAutoFill() { m_doAutoFill = true; } + void setHTMLRequested( bool html ) { m_htmlRequested = html; } HTMLCollectionImpl::CollectionInfo *collectionInfo(int type) { return m_collection_info+type; } @@ -83,4 +84,5 @@ protected: QMap<QString,HTMLMapElementImpl*> mapMap; bool m_doAutoFill; + bool m_htmlRequested; protected slots: --- kdelibs/khtml/khtml_part.cpp #1.1029:1.1030 @@ -1778,4 +1778,6 @@ void KHTMLPart::begin( const KURL &url, } else { d->m_doc = DOMImplementationImpl::instance()->createHTMLDocument( d->m_view ); + // HTML or XHTML? (#86446) + static_cast<HTMLDocumentImpl *>(d->m_doc)->setHTMLRequested( args.serviceType != "application/xhtml+xml" ); } #ifndef KHTML_NO_CARET --- kdelibs/khtml/ChangeLog #1.296:1.297 @@ -1,2 +1,8 @@ +2004-10-14 David Faure <faure@kde.org> + + * html/html_documentimpl.cpp (determineParseMode): + When the document is loaded as text/html, even if xhtml doctype, activate case-insensitive + ("htmlCompat") lookup of tags and attributes (but not in CSS parser). (#86446) + 2004-10-14 Allan Sandfeld Jensen <kde@carewolf.com> * rendering/*.*: WebCore merge/port of layouted->needsLayout