Summary: | Konqueror chokes on <!-- without //--> [testcase] | ||
---|---|---|---|
Product: | [Applications] konqueror | Reporter: | Dik Takken <kde> |
Component: | khtml parsing | Assignee: | Konqueror Developers <konq-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | aiacovitti, germain, kde |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Testcase
regression testcase OT patch, ignore |
Description
Dik Takken
2004-06-13 00:37:55 UTC
Created attachment 6338 [details]
Testcase
The Mozilla behaviour is to just display "Test". The Konqueror behaviour is to display: Test " document.write(content); Problem still present on KDE 3.3 Beta 1. No improvements in KDE 3.5. Nasty.. Okay the "right" thing is to ignore comments inside the <script> tag. This is what Firefox does. MSIE on the other hand parses comments inside scripts in some weird manor. Basically: <script> <!-- myjscode() </script> --> Will not execute anything in either MSIE or opera, but if you don't close the comments it will. Confirmed on trunk r794020. Not sure what the behaviour is supposed to be. The defacto standard behavior depends on parsing the entire document. Essentially: If a comment is never closed, the parser moves back and autocloses the comment immediately, and then reparses the rest of the document. We already have something similar for some other cases. SVN commit 1062741 by jtamate: BUG: 83284 BUG: 161409 <-- is no longer a comment start. http://reviewboard.kde.org/r/2358/ M +1 -1 htmltokenizer.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1062741 Created attachment 39152 [details]
regression testcase
I observed a regression due to revision 1062741.
Attached a reduced testcase.
Have to open a new bug report?
as noted by Andrea, this patch is bound to cause regressions unfortunately. IIUC, it is based on the assumption that the comment detection code was wrongly looking for '<!-' instead of '<!--'. But that is not what the original code was doing : if you may observe, the current character is compared to '-' too, in addition to the buffer, so as to have the correct '<!--' start sequence. The patch makes that '<!---' and thus unwillingly disables comment detection within some special tags. I believe the original problem is the tokenizer is trying to do some half-assed second-level tokenizing within some tags (title, script, etc) to try to detect comments in unusual situations. But it does that naïvely, without switching to any special state, so it is blind to any ordinary entity escaping. SVN commit 1070587 by ggarand: change r1062741 to rather skip comment detection in |title| altogether, in order to avoid some regressions in |script| tag parsing. This should be functionally equivalent (see comments on #83284) and is actually closer to the behaviour of Gecko. CCBUG: 83284 CCBUG: 161409 M +1 -1 htmltokenizer.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1070587 Some notes : it turns out the lax tokenizing of comments in parseSpecial was on purpose. <title> and some other have a special legacy and must absorb anything until more or less properly closed. Consider: data:text/html,<title>some <silly <>> demonstration<!-- this won't be parsed as a comment node --></title <div>Lorem As you can see, the title representation of this mess is very interoperable between browsers as: "some <silly <>> demonstration<!-- this won't be parsed as a comment node -->" Anyway, as the result of comment tokenizing was discarded in the context of title, it is indeed alright to skip the comment detection for it, as did the original patch by accident. --- Note to self : need to make a proper ecma wrapper object for comments. Right now they are [object CharacterImp] instead of [object Comment]. while I still think about that : this fix still isn't right.
data:text/html,<title>testcase <!-- </title> --> failed, not in title!</title>
must keep track of last encountered entity when checking for comment start.
> ---
> Note to self : need to make a proper ecma wrapper object for comments. Right
> now they are [object CharacterImp] instead of [object Comment].
>
oh, that too...
@Maksim : is the (to be attached) correct or is there a simpler way to do that? I couldn't seem to make it more minimal, even though the interface for Comment is empty...
Created attachment 40057 [details]
OT patch, ignore
SVN commit 1078786 by ggarand: fix many issues in the parser/tokenizer: .correctly fix tricky issue mentioned in #83284 /wrt to wrong tokenizing of lt entity + '!--' as a comment start, by tracking if we are far enough from the last encountered entity to be able to look back with confidence (also fix similar issue with end-tags: <title><title>should be in title</title>) .fix crash in some instances of <script .../> due to |script| bool member not being set (and thus triggering wrong code path in notifyFinished) .remove handling of unclosed script elements with 'src' attribute, as it is no longer compatible with other UAs and prevents us from correctly handling broken comments in |script| elements' content (WONTFIX #58420). .fix handling of unclosed comments in script elements(#83284 per se -> #c1) .fix handling of unclosed title element: <title>One<p>Two. .rename |parseSpecial| to more telling |parseRawContent|, along with all its dependent members ; add some comments explaining what's going on, as well as the purpose of the finish() hackery CCBUG: 83284 CCBUG: 58420 M +3 -0 htmlparser.cpp M +116 -104 htmltokenizer.cpp M +15 -15 htmltokenizer.h WebSVN link: http://websvn.kde.org/?view=rev&revision=1078786 |