Bug 86446

Summary:	[test case] getElementsByTagName does not find elements that are not visible
Product:	[Applications] konqueror	Reporter:	Paul Pacheco <paulpach>
Component:	khtml ecma	Assignee:	Konqueror Bugs <konqueror-bugs-null>
Status:	RESOLVED FIXED
Severity:	normal	CC:	faure, germain
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	Gentoo Packages
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	Test case showing problem Patch to fix getElementsByTagName problem with XHTML suggested patch Strict doctype transitional mode

Description Paul Pacheco 2004-08-02 19:04:00 UTC

Version:            (using KDE KDE 3.2.3)
Installed from:    Gentoo Packages
Compiler:          gcc (GCC) 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6) 
OS:                Linux

document.getElementsByTagName should give me all the elements with the suplied tag name regardless of whether the elements are visible or not.

The supplied example gives an alert message saying "found 3 images" in mozilla and IE, but gives "found 0 images" in konqueror.

Comment 1 Paul Pacheco 2004-08-02 19:06:01 UTC

Created attachment 6969 [details]
Test case showing problem

Test case showing the problem. 
If opened with mozilla or IE, it finds 3 images,
if opened with konqueror, it finds 0 images.

Comment 2 Rob Buis 2004-09-24 20:30:12 UTC

Hi,

The problem is not the invisible images, but the fact xhtml
works with lowercase tags and you search with uppercase.
I suppose the other browsers have some compatibility mode,
so I'll attach a patch that also makes konq/html a bit more
fogiving in this case.
Cheers,

Rob.

Comment 3 Rob Buis 2004-09-24 20:31:10 UTC

Created attachment 7664 [details]
Patch to fix getElementsByTagName problem with XHTML

Comment 4 Germain Garand 2004-09-25 02:41:25 UTC

Re #3: 
Sorry but I don't think the patch is acceptable ; it's a more complicated issue.

The behaviour you observe with Mozilla is because they implemented special compatibility rules for XHTML served as media type "text/html",
that are defined here:
http://www.w3.org/TR/xhtml-media-types/

This is a tricky section, full of "should" or "may", but basically, what we need to implement to follow this is:

- if the document is an XHTML document and is served as "text/html"
- then the lookups for elements by non-ns aware DOM HTML methods
  should be case insensitive, and the elements should be returned uppercase.

Comment 5 Yan Morin 2004-10-11 18:12:24 UTC

Even some staff from W3C use the XHTML as "text/html" is HTML behavior.
Like on this page: http://www.w3.org/DOM/Test/

The behaviour is also defined in 
http://www.w3.org/TR/xhtml1/#C_11
"user agents that access XHTML documents served as Internet media type text/html via the DOM can use the HTML DOM, and can rely upon element and attribute names being returned in upper-case from those interfaces."

Comment 6 David Faure 2004-10-11 18:59:47 UTC

From a quick look at the code, this would need an additional param in createHTMLDocument() (coming from args.serviceType, to check whether it's text/html or application/xhtml+xml), passed to HTMLDocumentImpl's constructor, and stored there, and then using it in the line 
e->setHTMLCompat( htmlMode() != XHtml ) (html/html_documentimpl.cpp:211)
or in HTMLDocumentImpl::determineParseMode() itself (depending on whether this should affect the parsing, or only the html-compat mode).

Comment 7 David Faure 2004-10-12 20:23:19 UTC

Created attachment 7852 [details]
suggested patch

Comment 8 Germain Garand 2004-10-13 14:07:48 UTC

Hi David!
the patch looks excellent... I wonder though about the alternative you describe in #6, if it should indeed affect parsing...
because if it does, there's not much point in having an xhtml doctype at all?
Does Mozilla do that too?

Comment 9 David Faure 2004-10-14 10:01:17 UTC

Good point. Do you know how I could write a test that checks whether the parsing is done with XHTML or with HTML4-Compat mode?

[We tried <script/> but that was a bad idea - htmltokenizer doesn't support it, it always looks for </script>.]

BTW with xhtml doctype and .xhtml extension, Mozilla treats the file like XML - it shows the raw tree. This surprises me, I thought the idea behind xhtml was still to render it as HTML :)

Comment 10 Germain Garand 2004-10-14 11:58:56 UTC

In fact there are very few differences between parsing modes within the Html parser... it is always very forgiving. The CSS Parser does make more differences - notably case sensitivity.

At the moment, Transitional and Compat behaves identically, so that's really just a difference between Strict and everything else ;(
(but there are very significative differences in _rendering_ between Strict mode and others - i.e the quirk mode).

For instance, given an XHtml document with doctype Strict, but served as "text/html",
the compatibility guidelines would require the DOM interface to be case insensitive, but the stylesheets still would be case sensitive...

> BTW with xhtml doctype and .xhtml extension, Mozilla treats the file like XML - it
> shows the raw tree. This surprises me, I thought the idea behind xhtml was still to
> render it as HTML :) 

they are always overdoing it :)

Comment 11 Germain Garand 2004-10-14 12:39:52 UTC

Created attachment 7872 [details]
Strict doctype

Comment 12 Germain Garand 2004-10-14 12:43:53 UTC

Created attachment 7873 [details]
transitional mode

Mozilla follows strictly the guidelines, i.e only attribute and element names
are case insensitive and returned uppercase.

Comment 13 David Faure 2004-10-14 20:28:40 UTC

Thanks for the testcases, I have merged them into my regression/tests testcases. The css parsing thing didn't test the actual patch though (they work before and after), to answer whether it's ok to change the parse mode. But I found a testcase that shows that it's in fact not ok to do so.

When publicId=strict and systemId=transitional, the code says "for XHTML, trust publicId, so be strict". But with my patch it now choose transitional since hMode != XHtml, whereas Mozilla does choose strict indeed [this particular testcase will be committed as tests/parser/compatmode_xhtml_mixed.html soon].

I also couldn't change the e->setHTMLCompat() line, there are in fact many such lines, all activating htmlcompat from many places. So I set htmlMode to non-xhtml at the end of determineParseMode, when all the rest has been done (so this doesn't influence e.g. the CSS parsing mode).

Comment 14 David Faure 2004-10-14 20:29:02 UTC

CVS commit by faure: 

When the document is loaded as text/html, even if xhtml doctype, activate case-insensitive
("htmlCompat") lookup of tags and attributes (but not in CSS parser). (#86446)

Regression tests: dom/namespaces.html dom/namespaces_xhtml_strict.html parser/compatmode*
CCMAIL: 86446-done@bugs.kde.org


  M +6 -0      ChangeLog   1.297
  M +2 -0      khtml_part.cpp   1.1030
  M +5 -0      html/html_documentimpl.cpp   1.168
  M +3 -1      html/html_documentimpl.h   1.76


--- kdelibs/khtml/html/html_documentimpl.cpp  #1.167:1.168
@@ -71,4 +71,5 @@ HTMLDocumentImpl::HTMLDocumentImpl(DOMIm
 
     m_doAutoFill = false;
+    m_htmlRequested = false;
 
 /* dynamic history stuff to be fixed later (pfeiffer)
@@ -382,4 +383,8 @@ void HTMLDocumentImpl::determineParseMod
         if ( hMode == XHtml )
             pMode = publicId;
+
+        // This needs to be done last, see tests/parser/compatmode_xhtml_mixed.html
+        if ( m_htmlRequested && hMode == XHtml )
+            hMode = Html4; // make all tags uppercase when served as text/html (#86446)
     }
     // kdDebug() << "DocumentImpl::determineParseMode: publicId =" << publicId << " systemId = " << systemId << endl;

--- kdelibs/khtml/html/html_documentimpl.h  #1.75:1.76
@@ -74,4 +74,5 @@ public:
 
     void setAutoFill() { m_doAutoFill = true; }
+    void setHTMLRequested( bool html ) { m_htmlRequested = html; }
 
     HTMLCollectionImpl::CollectionInfo *collectionInfo(int type) { return m_collection_info+type; }
@@ -83,4 +84,5 @@ protected:
     QMap<QString,HTMLMapElementImpl*> mapMap;
     bool m_doAutoFill;
+    bool m_htmlRequested;
 
 protected slots:

--- kdelibs/khtml/khtml_part.cpp  #1.1029:1.1030
@@ -1778,4 +1778,6 @@ void KHTMLPart::begin( const KURL &url, 
   } else {
     d->m_doc = DOMImplementationImpl::instance()->createHTMLDocument( d->m_view );
+    // HTML or XHTML? (#86446)
+    static_cast<HTMLDocumentImpl *>(d->m_doc)->setHTMLRequested( args.serviceType != "application/xhtml+xml" );
   }
 #ifndef KHTML_NO_CARET

--- kdelibs/khtml/ChangeLog  #1.296:1.297
@@ -1,2 +1,8 @@
+2004-10-14  David Faure  <faure@kde.org>
+
+        * html/html_documentimpl.cpp (determineParseMode):
+        When the document is loaded as text/html, even if xhtml doctype, activate case-insensitive
+        ("htmlCompat") lookup of tags and attributes (but not in CSS parser). (#86446)
+
 2004-10-14  Allan Sandfeld Jensen <kde@carewolf.com>
         * rendering/*.*: WebCore merge/port of layouted->needsLayout