279831 – akregator keeps downloading old feed articles -> many duplicates

Bug 279831 - akregator keeps downloading old feed articles -> many duplicates

Summary: akregator keeps downloading old feed articles -> many duplicates

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	akregator
Classification:	Applications
Component:	general (show other bugs)
Version:	unspecified
Platform:	Gentoo Packages Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	kdepim bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-08-10 17:59 UTC by Fabio Rossi
Modified:	2011-08-29 18:31 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Fabio Rossi 2011-08-10 17:59:39 UTC

Version:           unspecified (using KDE 4.7.0) 
OS:                Linux

I have the following feed: http://www.photonicsjobs.com/rss.xml. Akgreator 4.7.0 keeps downloading the same feeds every day resulting in tons of duplicated among the feed articles. 

It's not clear when it downloads duplicate articles but it happens quite often. The last article is dated 05 Aug 2011 and I have already 8 copies in my feed folder (so more than 1 article per day). All the articles have the same date/time.

Reproducible: Always

Steps to Reproduce:
Subscribe to the suggested feed, start to monitor the behaviour.

Actual Results:  
A lot of duplicate feed articles.

Expected Results:  
No duplicates.

Comment 1 Christophe Marin 2011-08-29 11:06:53 UTC

I'm afraid that's not a bug: 

I Added this feed and let akregator sync it for a few days. The issue comes from the website which alters the articles URLS.

eg with the article "Senior Manufacturing Engineer, Optics, Job Code: 1011" from yesterday:

- The url when clicking on "complete story" was http://www.photonicsjobs.com/job//2011-08-28/570 yesterday and today became http://www.photonicsjobs.com/job//2011-08-29/570

Looks like this website changes the GUID value (GUID=Globally Unique Identifier). From an Akregator pov, a new GUID means a new article, hence the article duplication

For further details see http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.photonicsjobs.com%2Frss.xml and look at line ~24 (<guid>http://www.photonicsjobs.com/job//2011-08-29/570</guid>)

Only the website owner can fix this issue

Comment 2 Fabio Rossi 2011-08-29 18:31:09 UTC

Thanks for the analysis, I'll contact the webmaster!

I have one comment. I have read that the mandatory subelements of <item> are <title>, <link> and <description>. How is akregator detecting old articles in this case (without <guid> elements)?

Do you think it's possible to implement a "delete duplicates" function in akregator to clean the mess in this feed? I mean, a function which compares all the mandatory elements in the articles (cited above) without considering <guid>. If you think it makes sense I can open another bug.