Bug 110426

Summary: XML/DOM: text nodes containing only whitespace are removed
Product: [Applications] konqueror Reporter: Stefan Brüns <stefan.bruens>
Component: khtml parsingAssignee: Konqueror Developers <konq-bugs>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:

Description Stefan Brüns 2005-08-08 23:07:07 UTC
Version:            (using KDE KDE 3.4.2)
Installed from:    SuSE RPMs

In an XML document created via XMLLoad "<a><b>first</b><b> </b></a>" should result in something like:
a
  b
    #text "first"
  b
    #text " "

but the second text node is nonexistant. I think the bug is in
http://websvn.kde.org/trunk/KDE/kdelibs/khtml/xml/xml_tokenizer.cpp
XMLHandler::characters(...)

A small test can be found under:
http://www.kawo1.rwth-aachen.de/~lurchi/js_dom_test/ws_only_node.html
Comment 1 Maksim Orlovich 2006-03-02 01:15:17 UTC
SVN commit 514945 by orlovich:

Remove hack that swallowed lots of text nodes, we shouldn't do this for XML
BUG:110426


 M  +0 -5      xml_tokenizer.cpp  


--- branches/KDE/3.5/kdelibs/khtml/xml/xml_tokenizer.cpp #514944:514945
@@ -232,11 +232,6 @@
 
 bool XMLHandler::characters( const QString& ch )
 {
-    //this is needed for xhtml parsing. otherwise we try to attach
-    //"\n\t" to html, head and other nodes which don't accept textchildren
-    if ( ch.stripWhiteSpace().isEmpty() )
-        return true;
-
     if (currentNode()->nodeType() == Node::TEXT_NODE ||
         currentNode()->nodeType() == Node::CDATA_SECTION_NODE ||
         enterText()) {