Bug 112491

Summary:	CDATA in feed is not handled correctly
Product:	[Applications] akregator	Reporter:	Eckhart Wörner <ewoerner>
Component:	feed parser	Assignee:	kdepim bugs <pim-bugs-null>
Status:	RESOLVED WORKSFORME
Severity:	normal	CC:	bderidder, jkt, muczyjoe, osterfeld, roman.cheplyaka
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	unspecified
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description Eckhart Wörner 2005-09-12 19:19:23 UTC

Version:           1.2 (using KDE 3.4.2, Kubuntu Package 4:3.4.2-0ubuntu0hoary2 )
Compiler:          gcc version 3.3.5 (Debian 1:3.3.5-8ubuntu2)
OS:                Linux (i686) release 2.6.10-5-386

In http://www.blogistan.co.uk/qt/atom.xml , <![CDATA[ ... ]]> is used to mask the articles. These CDATA tags belong to the XML file and should therefore not get passed to KHTML. At the moment, they do get passed to KHTML, resulting in strange rendering results.

Comment 1 Frank Osterfeld 2005-09-30 08:36:09 UTC

This example is not Atom-1.0 compliant.

In Atom, CDATA seems not valid in <content type="html">, according to

http://www.atomenabled.org/developers/syndication/#text

    "If type="html", then this element contains entity escaped html.
    <title type="html">
      AT&amp;amp;T bought &lt;b&gt;by SBC&lt;/b&gt;!
    </title>"

So the feed should use escaped HTML instead of CDATA.

Comment 2 Eckhart Wörner 2005-10-23 11:29:18 UTC

http://www.w3.org/TR/2004/REC-xml-20040204/#sec-cdata-sect says:

"[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":]"

Comment 3 Eckhart Wörner 2005-11-10 17:13:15 UTC

*** Bug 116051 has been marked as a duplicate of this bug. ***

Comment 4 Frank Osterfeld 2006-01-16 00:16:41 UTC

SVN commit 498704 by osterfeld:

fix atom:content parsing: Don't show tags when for Atom 1.0 feeds with escaped HTML in it

BUG: 112491, 117938


 M  +36 -15    tools_p.cpp  


--- branches/KDE/3.5/kdepim/akregator/src/librss/tools_p.cpp #498703:498704
@@ -47,21 +47,42 @@
 	QDomElement e = node.toElement();
 	QString result;
 
-	if (elemName == "content" && ((e.hasAttribute("mode") && e.attribute("mode") == "xml") || !e.hasAttribute("mode")))
-		result = childNodesAsXML(node);
-	else
-		result = e.text();
-
-	bool hasPre = result.contains("<pre>",false);
-	bool hasHtml = hasPre || result.contains("<");	// FIXME: test if we have html, should be more clever -> regexp
-	if(!isInlined && !hasHtml)						// perform nl2br if not a inline elt and it has no html elts
-		result = result = result.replace(QChar('\n'), "<br />");
-	if(!hasPre)										// strip white spaces if no <pre>
-		result = result.simplifyWhiteSpace();
-
-	if (result.isEmpty())
-		return QString::null;
-
+        bool doHTMLCheck = true;
+ 
+        if (elemName == "content") // we have Atom here
+        {
+            doHTMLCheck = false;
+            // the first line is always the Atom 0.3, the second Atom 1.0
+            if (( e.hasAttribute("mode") && e.attribute("mode") == "escaped" && e.attribute("type") == "text/html" )
+            || (!e.hasAttribute("mode") && e.attribute("type") == "html"))
+            {
+                result = KCharsets::resolveEntities(e.text().simplifyWhiteSpace()); // escaped html
+            }
+            else if (( e.hasAttribute("mode") && e.attribute("mode") == "escaped" && e.attribute("type") == "text/plain" )
+                       || (!e.hasAttribute("mode") && e.attribute("type") == "text"))
+            {
+                result = e.text().stripWhiteSpace(); // plain text
+            }
+            else if (( e.hasAttribute("mode") && e.attribute("mode") == "xml" )
+                       || (!e.hasAttribute("mode") && e.attribute("type") == "xhtml"))
+            {
+                result = childNodesAsXML(e); // embedded XHMTL
+            }
+            
+        }
+        
+        if (doHTMLCheck) // check for HTML; not necessary for Atom:content
+        {
+            bool hasPre = result.contains("<pre>",false);
+            bool hasHtml = hasPre || result.contains("<");	// FIXME: test if we have html, should be more clever -> regexp
+            if(!isInlined && !hasHtml)						// perform nl2br if not a inline elt and it has no html elts
+                    result = result = result.replace(QChar('\n'), "<br />");
+            if(!hasPre)										// strip white spaces if no <pre>
+                    result = result.simplifyWhiteSpace();
+        
+            if (result.isEmpty())
+                    return QString::null;
+        }
 	return result;
 }

Comment 5 Eckhart Wörner 2006-02-28 21:52:29 UTC

This bug has only been fixed for Atom, not for RSS. Reopened it therefore.

Comment 6 Eckhart Wörner 2006-02-28 21:53:58 UTC

*** Bug 122857 has been marked as a duplicate of this bug. ***

Comment 7 Peter Avramucz 2007-05-19 15:54:06 UTC

Same here.
Gentoo ~amd64 kde 3.5.6
Please fix this annoying bug!

Comment 8 Frank Osterfeld 2008-10-15 21:43:24 UTC

considered fixed in 4.x, reopen with a curren test feed (xml file, not link( otherwise