58487 – A Parser bug : optionnal tags badly handled

Bug 58487 - A Parser bug : optionnal tags badly handled

Summary: A Parser bug : optionnal tags badly handled

Status:	RESOLVED FIXED

Alias:	None

Product:	quanta
Classification:	Miscellaneous
Component:	general (show other bugs)
Version:	unspecified
Platform:	unspecified Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	András Manţia

URL:
Keywords:

Depends on:
Blocks:

Reported:	2003-05-14 19:47 UTC by Nicolas Deschildre
Modified:	2003-05-17 09:40 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Nicolas Deschildre 2003-05-14 19:47:31 UTC

Version:           3.2-CVS-1 (using KDE 3.1.9)
Compiler:          gcc version 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk)
OS:          Linux (i686) release 2.4.21-0.13mdk

In the following example : 
<html><body><table>
<tr><td>
BLAHHHH
<p>
</td></tr>
<tr><td>
MISSING TEXT
</td></tr>
</table></body></html>
The Node Tree build from this HTML file make the second <tr> appears like the child of the first <td> tag :
...
   td
   +-- BLAHHHH
   +-- p
   +-- tr
        +-- td
            +-- MISSING TEXT
This make the MISSING TEXT not appears in the kafka part (as khtml doesn't want to add a <tr> in a <td> :-).
I think the problem is located in parser.cpp:383-387  v1.83
It checks if we should go up or not and as the <p> tag seems to not have the single properties, it goes down but in fact there is no closing </p> : the closing </p> is optionnal.
I think one possible solution should be to store the Node level which we are currently, and by default going down and  parsing as usual : when going up and reaching the Node level with the ambigous Node, we look at the next Node and if it is a closing </p>, nothing to change, else moving the <p> tag and moving up the subtree.
The problems are : -multiple optionnal closing Nodes (but should not be too hard to solve) and it might slow down a bit the parser.
Good luck Andras !

Comment 1 András Manţia 2003-05-15 11:06:39 UTC

Subject: quanta/quanta/parser

CVS commit by amantia: 

Parsing fix.

CCMAIL: 58487-done@bugs.kde.org


  M +1 -0      parser.cpp   1.84


--- quanta/quanta/parser/parser.cpp  #1.83:1.84
@@ -392,4 +392,5 @@ Node *Parser::parseArea(int startLine, i
         {
           QString searchFor = (m_dtd->caseSensitive)?tag->name:tag->name.upper();
+          searchFor.remove('/');
           if ( qTag->stoppingTags.contains( searchFor ) )
           {

Comment 2 Nicolas Deschildre 2003-05-16 18:54:03 UTC

Wow, very fastly resolved! 
But not totally : it is weird... Take the same example as above and remplace <p> by <li> 
or even <fjpsdfjp> or whatever : the same bug is still here... 
Good luck!

Comment 3 András Manţia 2003-05-17 09:40:02 UTC

Subject: Re:  A Parser bug : optionnal tags badly handled

> ------- Wow, very fastly resolved!
> But not totally : it is weird... Take the same example as above and
> remplace <p> by <li> or even <fjpsdfjp> or whatever : the same bug is still
> here...
> Good luck!

Does <td> stops the area of <li>? Does it stop the area for <fjpsdfjp> ? ;-) 
If yes, than <td> should appear as a stopping tag for <li>. Right now <li> is 
"stopped" by <li>. From li.tag:
   <stoppingtags>
        <stoppingtag>li</stoppingtag>
    </stoppingtags>

This means that "<li>sometext<li>sometext</li>" is equal to 
"<li>sometext</li><li>sometext</li>". If <td> is not stopping <li>, then 
"<li>sometext<td>sometext</td>" is not equal to 
"<li>sometext</li><td>sometext</td>", therefor there is a bug in your HTML 
code, if you write something like that. And the parser shows you that it's a 
bug, as the closing </td> is not on the same level as the opening <td>. 

I consider the bug to be fixed, and if you think that some stopping tags are 
not listed for some tags, well, the DTD definition should be fixed.

Andras