Bug 123424

Summary: identify articles by url and not by text and title
Product: [Applications] akregator Reporter: uran238
Component: generalAssignee: kdepim bugs <kdepim-bugs>
Status: REPORTED ---    
Severity: wishlist    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:

Description uran238 2006-03-11 12:03:29 UTC
Version:            (using KDE KDE 3.5.1)
Installed from:    Gentoo Packages
OS:                Linux

Some news sources alter the text or title to the article in the feed, so akregator will mark it as new. But it isn't new, most times just a typo was fixed. Often you noticed the typo yourself and if not you don't want to be informed about ;) 
So I think it would be usefull to identify an article by its url and not by the title and text. (I don't know how akregator works internaly, but I'm quite sure it does so) 
The articles should be treated as usual. They should be marked as read, if the old one was already read and as new if the old one was not read. 
In most cases it isn't usefull to keep the old article, because the text on the website has already changed. 
Maybe it would be usefull to mark an article as new again, if the user marked the old one as important?

What do you think?
Comment 1 Frank Osterfeld 2006-03-11 12:32:07 UTC
> Some news sources alter the text or title to the article in the feed, so
> akregator will mark it as new. But it isn't new, most times just a typo was
> fixed. Often you noticed the typo yourself and if not you don't want to be 
> informed about ;)
I agree that resetting them to "New" isn't what you want in most cases. There *might* be updates to the item, but most of the time it's just typos.

> So I think it would be usefull to identify an article by its url and not by
> the title and text. (I don't know how akregator works internaly, but I'm
> quite sure it does so) 

If the article has a <guid> (RSS) or <id> (Atom), we use that. If not, we use title + content. Using the url is not a good idea in general, as there are feeds where the link points always to the same site, or there is no link at all. Admittedly, these cases a rare. We might evaluate heuristics like "if time stamp and link are equal, it's the same article" though.
Comment 2 Justin Zobel 2021-03-09 04:12:11 UTC
Thank you for the bug report.

As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists.

If this bug is no longer persisting or relevant please change the status to resolved.