| Summary: | CDATA in feed is not handled correctly | ||
|---|---|---|---|
| Product: | [Applications] akregator | Reporter: | Eckhart Wörner <ewoerner> |
| Component: | feed parser | Assignee: | kdepim bugs <pim-bugs-null> |
| Status: | RESOLVED WORKSFORME | ||
| Severity: | normal | CC: | bderidder, jkt, muczyjoe, osterfeld, roman.cheplyaka |
| Priority: | NOR | ||
| Version First Reported In: | unspecified | ||
| Target Milestone: | --- | ||
| Platform: | unspecified | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
|
Description
Eckhart Wörner
2005-09-12 19:19:23 UTC
This example is not Atom-1.0 compliant. In Atom, CDATA seems not valid in <content type="html">, according to http://www.atomenabled.org/developers/syndication/#text "If type="html", then this element contains entity escaped html. <title type="html"> AT&amp;T bought <b>by SBC</b>! </title>" So the feed should use escaped HTML instead of CDATA. http://www.w3.org/TR/2004/REC-xml-20040204/#sec-cdata-sect says: "[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":]" *** Bug 116051 has been marked as a duplicate of this bug. *** SVN commit 498704 by osterfeld:
fix atom:content parsing: Don't show tags when for Atom 1.0 feeds with escaped HTML in it
BUG: 112491, 117938
M +36 -15 tools_p.cpp
--- branches/KDE/3.5/kdepim/akregator/src/librss/tools_p.cpp #498703:498704
@@ -47,21 +47,42 @@
QDomElement e = node.toElement();
QString result;
- if (elemName == "content" && ((e.hasAttribute("mode") && e.attribute("mode") == "xml") || !e.hasAttribute("mode")))
- result = childNodesAsXML(node);
- else
- result = e.text();
-
- bool hasPre = result.contains("<pre>",false);
- bool hasHtml = hasPre || result.contains("<"); // FIXME: test if we have html, should be more clever -> regexp
- if(!isInlined && !hasHtml) // perform nl2br if not a inline elt and it has no html elts
- result = result = result.replace(QChar('\n'), "<br />");
- if(!hasPre) // strip white spaces if no <pre>
- result = result.simplifyWhiteSpace();
-
- if (result.isEmpty())
- return QString::null;
-
+ bool doHTMLCheck = true;
+
+ if (elemName == "content") // we have Atom here
+ {
+ doHTMLCheck = false;
+ // the first line is always the Atom 0.3, the second Atom 1.0
+ if (( e.hasAttribute("mode") && e.attribute("mode") == "escaped" && e.attribute("type") == "text/html" )
+ || (!e.hasAttribute("mode") && e.attribute("type") == "html"))
+ {
+ result = KCharsets::resolveEntities(e.text().simplifyWhiteSpace()); // escaped html
+ }
+ else if (( e.hasAttribute("mode") && e.attribute("mode") == "escaped" && e.attribute("type") == "text/plain" )
+ || (!e.hasAttribute("mode") && e.attribute("type") == "text"))
+ {
+ result = e.text().stripWhiteSpace(); // plain text
+ }
+ else if (( e.hasAttribute("mode") && e.attribute("mode") == "xml" )
+ || (!e.hasAttribute("mode") && e.attribute("type") == "xhtml"))
+ {
+ result = childNodesAsXML(e); // embedded XHMTL
+ }
+
+ }
+
+ if (doHTMLCheck) // check for HTML; not necessary for Atom:content
+ {
+ bool hasPre = result.contains("<pre>",false);
+ bool hasHtml = hasPre || result.contains("<"); // FIXME: test if we have html, should be more clever -> regexp
+ if(!isInlined && !hasHtml) // perform nl2br if not a inline elt and it has no html elts
+ result = result = result.replace(QChar('\n'), "<br />");
+ if(!hasPre) // strip white spaces if no <pre>
+ result = result.simplifyWhiteSpace();
+
+ if (result.isEmpty())
+ return QString::null;
+ }
return result;
}
This bug has only been fixed for Atom, not for RSS. Reopened it therefore. *** Bug 122857 has been marked as a duplicate of this bug. *** Same here. Gentoo ~amd64 kde 3.5.6 Please fix this annoying bug! considered fixed in 4.x, reopen with a curren test feed (xml file, not link( otherwise |