Summary: | CDATA in feed is not handled correctly | ||
---|---|---|---|
Product: | [Applications] akregator | Reporter: | Eckhart Wörner <ewoerner> |
Component: | feed parser | Assignee: | kdepim bugs <kdepim-bugs> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | bderidder, jkt, muczyjoe, osterfeld, roman.cheplyaka |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Eckhart Wörner
2005-09-12 19:19:23 UTC
This example is not Atom-1.0 compliant. In Atom, CDATA seems not valid in <content type="html">, according to http://www.atomenabled.org/developers/syndication/#text "If type="html", then this element contains entity escaped html. <title type="html"> AT&amp;T bought <b>by SBC</b>! </title>" So the feed should use escaped HTML instead of CDATA. http://www.w3.org/TR/2004/REC-xml-20040204/#sec-cdata-sect says: "[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":]" *** Bug 116051 has been marked as a duplicate of this bug. *** SVN commit 498704 by osterfeld: fix atom:content parsing: Don't show tags when for Atom 1.0 feeds with escaped HTML in it BUG: 112491, 117938 M +36 -15 tools_p.cpp --- branches/KDE/3.5/kdepim/akregator/src/librss/tools_p.cpp #498703:498704 @@ -47,21 +47,42 @@ QDomElement e = node.toElement(); QString result; - if (elemName == "content" && ((e.hasAttribute("mode") && e.attribute("mode") == "xml") || !e.hasAttribute("mode"))) - result = childNodesAsXML(node); - else - result = e.text(); - - bool hasPre = result.contains("<pre>",false); - bool hasHtml = hasPre || result.contains("<"); // FIXME: test if we have html, should be more clever -> regexp - if(!isInlined && !hasHtml) // perform nl2br if not a inline elt and it has no html elts - result = result = result.replace(QChar('\n'), "<br />"); - if(!hasPre) // strip white spaces if no <pre> - result = result.simplifyWhiteSpace(); - - if (result.isEmpty()) - return QString::null; - + bool doHTMLCheck = true; + + if (elemName == "content") // we have Atom here + { + doHTMLCheck = false; + // the first line is always the Atom 0.3, the second Atom 1.0 + if (( e.hasAttribute("mode") && e.attribute("mode") == "escaped" && e.attribute("type") == "text/html" ) + || (!e.hasAttribute("mode") && e.attribute("type") == "html")) + { + result = KCharsets::resolveEntities(e.text().simplifyWhiteSpace()); // escaped html + } + else if (( e.hasAttribute("mode") && e.attribute("mode") == "escaped" && e.attribute("type") == "text/plain" ) + || (!e.hasAttribute("mode") && e.attribute("type") == "text")) + { + result = e.text().stripWhiteSpace(); // plain text + } + else if (( e.hasAttribute("mode") && e.attribute("mode") == "xml" ) + || (!e.hasAttribute("mode") && e.attribute("type") == "xhtml")) + { + result = childNodesAsXML(e); // embedded XHMTL + } + + } + + if (doHTMLCheck) // check for HTML; not necessary for Atom:content + { + bool hasPre = result.contains("<pre>",false); + bool hasHtml = hasPre || result.contains("<"); // FIXME: test if we have html, should be more clever -> regexp + if(!isInlined && !hasHtml) // perform nl2br if not a inline elt and it has no html elts + result = result = result.replace(QChar('\n'), "<br />"); + if(!hasPre) // strip white spaces if no <pre> + result = result.simplifyWhiteSpace(); + + if (result.isEmpty()) + return QString::null; + } return result; } This bug has only been fixed for Atom, not for RSS. Reopened it therefore. *** Bug 122857 has been marked as a duplicate of this bug. *** Same here. Gentoo ~amd64 kde 3.5.6 Please fix this annoying bug! considered fixed in 4.x, reopen with a curren test feed (xml file, not link( otherwise |