Bug 408079

Summary: Fetching feeds causes duplicated items
Product: [Applications] akregator Reporter: Daniel Roschka <danielroschka>
Component: feed parserAssignee: kdepim bugs <kdepim-bugs>
Status: REPORTED ---    
Severity: normal    
Priority: NOR    
Version: GIT (master)   
Target Milestone: ---   
Platform: Debian unstable   
OS: Linux   
Latest Commit: Version Fixed In:

Description Daniel Roschka 2019-05-29 18:20:00 UTC
When fetching a feed multiple times akregator duplicates existing items when the content of a fetched item differs from the content of the same item already available locally. I'm suffering from this bug now since 10+ years and would like to see it finally gone.

Here is my theory why it happens:

Instead of using the guid only to compare two items for equality, Akregator builds a hash over title, description, content, link and author (https://github.com/KDE/akregator/blob/0d588dcbfb9cc93dec5b6bcbf3b01336ca1d09ce/src/feed/feed.cpp#L581-L585 and https://github.com/KDE/akregator/blob/0d588dcbfb9cc93dec5b6bcbf3b01336ca1d09ce/src/article.cpp#L189) and checks that as well, unless the guid started with "hash:". I believe this is not according to the specification, which states:

> guid stands for globally unique identifier. It's a string that uniquely identifies the item.
> When present, an aggregator may choose to use this string to determine if an item is new.
> 
> <guid>http://some.server.com/weblogItem3207</guid>
> 
> There are no rules for the syntax of a guid. Aggregators must view them as a string. It's up to
> the source of the feed to establish the uniqueness of the string.

http://www.rssboard.org/rss-specification#ltguidgtSubelementOfLtitemgt

The current behavior produces duplicate items when authors fix typos in their posts or when software inserts random bits in the data (e.g. in Javascript included in the markup (Podlove Publisher is known for that (https://github.com/podlove/podlove-publisher/blob/192a2710b6ad3d0f5eff67f4daacb5d6dac6ab4a/lib/modules/subscribe_button/button.php#L88))). The latter case is particularly annoying as it produces a new item every single time akregator fetches the feed.

I'd be happy to provide additional information if necessary.