Bug 279831

Summary: akregator keeps downloading old feed articles -> many duplicates
Product: [Applications] akregator Reporter: Fabio Rossi <rossi.f>
Component: generalAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED UPSTREAM    
Severity: normal    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:

Description Fabio Rossi 2011-08-10 17:59:39 UTC
Version:           unspecified (using KDE 4.7.0) 
OS:                Linux

I have the following feed: http://www.photonicsjobs.com/rss.xml. Akgreator 4.7.0 keeps downloading the same feeds every day resulting in tons of duplicated among the feed articles. 

It's not clear when it downloads duplicate articles but it happens quite often. The last article is dated 05 Aug 2011 and I have already 8 copies in my feed folder (so more than 1 article per day). All the articles have the same date/time.

Reproducible: Always

Steps to Reproduce:
Subscribe to the suggested feed, start to monitor the behaviour.

Actual Results:  
A lot of duplicate feed articles.

Expected Results:  
No duplicates.
Comment 1 Christophe Marin 2011-08-29 11:06:53 UTC
I'm afraid that's not a bug: 

I Added this feed and let akregator sync it for a few days. The issue comes from the website which alters the articles URLS.

eg with the article "Senior Manufacturing Engineer, Optics, Job Code: 1011" from yesterday:

- The url when clicking on "complete story" was http://www.photonicsjobs.com/job//2011-08-28/570 yesterday and today became http://www.photonicsjobs.com/job//2011-08-29/570

Looks like this website changes the GUID value (GUID=Globally Unique Identifier). From an Akregator pov, a new GUID means a new article, hence the article duplication

For further details see http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.photonicsjobs.com%2Frss.xml and look at line ~24 (<guid>http://www.photonicsjobs.com/job//2011-08-29/570</guid>)

Only the website owner can fix this issue
Comment 2 Fabio Rossi 2011-08-29 18:31:09 UTC
Thanks for the analysis, I'll contact the webmaster!

I have one comment. I have read that the mandatory subelements of <item> are <title>, <link> and <description>. How is akregator detecting old articles in this case (without <guid> elements)?

Do you think it's possible to implement a "delete duplicates" function in akregator to clean the mess in this feed? I mean, a function which compares all the mandatory elements in the articles (cited above) without considering <guid>. If you think it makes sense I can open another bug.