Version: (using KDE KDE 3.3.1) Installed from: Debian testing/unstable Packages It quite often happens with news feeds (e.g. at Spiegel.de or Netzeitung.de that the same news is linked more than once, when it is updated, often just when the headline has been changed. What seems to be in common is the url which is pointing to the article. What would be really good if akgregator would be able to filter this double entries, at least when the article has not been yet read. If it had been read it could mark an article as updated.
> What seems to be in common is the url which is pointing to the article. Note that there are feeds (eg distrowatch apps feed) where all articles point to the same url (homepage). :-)
*** This bug has been confirmed by popular vote. ***
CVS commit by osterfeld: use rdf:about as ID in RSS 1.0 (RDF) feeds. This should reduce the number of dupes significantly. (Checking for "rdf:about" instead of resolving the namespace properly is a hack (as one could use another prefix for the RDF namespace), but attributeNS() didn't work) CCBUG: 93400 M +21 -8 article.cpp 1.23 --- kdepim/akregator/src/librss/article.cpp #1.22:1.23 @@ -125,7 +125,20 @@ Article::Article(const QDomNode &node, F } + QDomElement element = QDomNode(node).toElement(); + + // in RSS 1.0, we use <item about> attribute as ID + // FIXME: pass format version instead of checking for attribute + + if (!element.isNull() && element.hasAttribute(QString::fromLatin1("rdf:about"))) + { + d->guid = element.attribute(QString::fromLatin1("rdf:about")); // HACK: using ns properly did not work + d->guidIsPermaLink = false; + } + else + { tagName=(format==AtomFeed)? QString::fromLatin1("id"): QString::fromLatin1("guid"); QDomNode n = node.namedItem(tagName); - if (!n.isNull()) { + if (!n.isNull()) + { d->guidIsPermaLink = (format==AtomFeed)? false : true; if (n.toElement().attribute(QString::fromLatin1("isPermaLink"), "true") == "false") d->guidIsPermaLink = false; @@ -134,4 +146,5 @@ Article::Article(const QDomNode &node, F d->guid = elemText; } + } if(d->guid.isEmpty()) {
CVS commit by osterfeld: backport: use rdf:about in RSS 1.0 feeds as guid. CCBUG: 93400 M +21 -8 article.cpp 1.22.6.1 --- kdepim/akregator/src/librss/article.cpp #1.22:1.22.6.1 @@ -125,7 +125,20 @@ Article::Article(const QDomNode &node, F } + QDomElement element = QDomNode(node).toElement(); + + // in RSS 1.0, we use <item about> attribute as ID + // FIXME: pass format version instead of checking for attribute + + if (!element.isNull() && element.hasAttribute(QString::fromLatin1("rdf:about"))) + { + d->guid = element.attribute(QString::fromLatin1("rdf:about")); // HACK: using ns properly did not work + d->guidIsPermaLink = false; + } + else + { tagName=(format==AtomFeed)? QString::fromLatin1("id"): QString::fromLatin1("guid"); QDomNode n = node.namedItem(tagName); - if (!n.isNull()) { + if (!n.isNull()) + { d->guidIsPermaLink = (format==AtomFeed)? false : true; if (n.toElement().attribute(QString::fromLatin1("isPermaLink"), "true") == "false") d->guidIsPermaLink = false; @@ -134,4 +146,5 @@ Article::Article(const QDomNode &node, F d->guid = elemText; } + } if(d->guid.isEmpty()) {
different feets may have the same articels as well, e.g. people might publish their things on planetgnome and planetfreedesktop, this should be filtered in the "All Feeds" list as well
the cleanest solution to implement that would be to check if a article already exists (hash/guid) in feed.h:appendArticles. If the article already exist take the old one and append it to the list. An article must be able to have more than one m_feed then which causes some incompatibilities that have to be considered.
@Heinrich: That would need a global archive, or at least a global article index. The current implementation is based on the assumption that every article is part of exactly one feed and that it is the feed's business to manage his articles (GUIDs are considered unique only per feed, expiry, notification inside of akregator etc.). I won't introduce additional complexity just because of a few articles showing up in multiple aggregator feeds. I close this bug because problems with per-feed dupes (original report) are fixed except cases where we have no ID (RSS 0.9x) and can't fix it properly.
Yes, you are right, currently every article can only have one feed, but the global archive could be the "All Feeds" Feed. We could then add an attribute like "duplicates" to the article in which the duplicates are saved. Actions like "mark as read" could then be performed on the article and it's duplicates. In fact I have a lot of duplicates here (at least 20%).
Heinrich Wendel: Please use bug #100784 which deals with that problem for discussion.