Bug 100784 - Remove duplicate entries and feeds
Summary: Remove duplicate entries and feeds
Status: RESOLVED WORKSFORME
Alias: None
Product: akregator
Classification: Applications
Component: general (show other bugs)
Version: 1.0
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
: 99584 123862 221583 245917 285177 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-03-04 04:12 UTC by Anchovi Paste
Modified: 2021-03-11 20:41 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Screenshot of akreagor showing kde userbase feed (178.79 KB, image/png)
2009-01-18 10:46 UTC, Mark Ziegler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anchovi Paste 2005-03-04 04:12:55 UTC
Version:           1.0 (using KDE KDE 3.4.0)
Installed from:    Compiled From Sources
Compiler:          gcc 3.4.3 
OS:                Linux

is there a way to display only unique feeds (or entries) and ignore some of the repeats. For example I subscribe to lxer and osdir and do not like when I have 2 stories that are the same displayed as the same entry. I could submit a whish list so that akregator is able to keep track of the repeats and somehow bump them up and may be give thema differrent color, they are today's top stories
Comment 1 Heinrich Wendel 2005-06-06 23:20:04 UTC
see bug #93400
Comment 2 Eckhart Wörner 2005-06-16 21:22:07 UTC
The mentioned bug is not exactly what this bug deals with.

Quotes which should belong to this bug taken from bug #93400:

"------- Additional Comment #5 From Heinrich Wendel 2005-06-08 16:51 -------  
different feets may have the same articels as well, e.g. people might publish their things on planetgnome and planetfreedesktop, this should be filtered in the "All Feeds" list as well 

------- Additional Comment #6 From Heinrich Wendel 2005-06-15 19:37 -------  
the cleanest solution to implement that would be to check if a article already exists (hash/guid) in feed.h:appendArticles. If the article already exist take the old one and append it to the list. An article must be able to have more than one m_feed then which causes some incompatibilities that have to be considered. 

------- Additional Comment #7 From Frank Osterfeld 2005-06-15 20:20 -------  
@Heinrich: That would need a global archive, or at least a global article index. The current implementation is based on the assumption that every article is part of exactly one feed and that it is the feed's business to manage his articles (GUIDs are considered unique only per feed, expiry, notification inside of akregator etc.). I won't introduce additional complexity just because of a few articles showing up in multiple aggregator feeds. 

------- Additional Comment #8 From Heinrich Wendel 2005-06-16 00:47 -------  
Yes, you are right, currently every article can only have one feed, but the global archive could be the "All Feeds" Feed. We could then add an attribute like "duplicates" to the article in which the duplicates are saved. Actions like "mark as read" could then be performed on the article and it's duplicates. In fact I have a lot of duplicates here (at least 20%)."
Comment 3 Eckhart Wörner 2005-06-16 21:28:55 UTC
You would have to compare the hash and the GUID with those of every article in whole Akregator archive backend which causes high CPU and disc load. Checking with my own archive that this system merely has any success because lots of articles are not exactly the same on different feeds (you would have to work with fuzzy checksums). Furthermore GUIDs are not neccessarily the same even when dealing with the same article.
Comment 4 Heinrich Wendel 2005-06-18 11:38:24 UTC
Comparing the guid will do it in most of the cases. It won't cause any disk load since every article is in loaded already. For CPU load, it can't take to long to compare let's say 10.000 articles.
Comment 5 greatbunzinni 2005-12-04 00:06:46 UTC
*** This bug has been confirmed by popular vote. ***
Comment 6 Frank Osterfeld 2005-12-04 16:23:53 UTC
"merging" identical articles (identical means they have the same guid here) poses (at least) two problems:

- article storage and expiry: Which feed is responsible for article deletion if an article has multiple parents? And who should store it? Possible solution: a global article archive instead of an per-feed archive, were feeds "release" articles when deleting/expiring them. When the last feed has released an article, it is really deleted from the archive.

- slight article differences: E.g. an item on Planet KDE might slightly differ in format (markup, encoding) or even text content (abbreviated, an additional link etc.) from the item in the original feed. So the feeds would overwrite their changes vice versa all the time.
Comment 7 FiNeX 2007-12-11 17:09:27 UTC
*** Bug 99584 has been marked as a duplicate of this bug. ***
Comment 8 FiNeX 2007-12-11 17:15:38 UTC
*** Bug 121865 has been marked as a duplicate of this bug. ***
Comment 9 FiNeX 2007-12-11 17:16:57 UTC
*** Bug 123862 has been marked as a duplicate of this bug. ***
Comment 10 Jens 2008-07-13 22:14:12 UTC
I would also like to request that if there are duplicates I can give a certain folder a priority so it does not get deleted in this folder.

E.g. there are news sites that offer different feeds: one for the features, another one for politics, another one for business etc.

Now some articles can be about politics but are also feature news on the main site. So I would prefer to only see this article in the politics feed but not in the feature feed. In other words: In the features feed I only want to see news that do not concern politics or business.
Comment 11 Mark Ziegler 2009-01-18 10:44:44 UTC
Feed of KDE's Userbase gives a lot of duplicates after some weeks.
I ran akregator with default archive settings.
See attached screenshot.
Comment 12 Mark Ziegler 2009-01-18 10:46:03 UTC
Created attachment 30376 [details]
Screenshot of akreagor showing kde userbase feed
Comment 13 Christophe Marin 2010-03-26 00:15:17 UTC
*** Bug 221583 has been marked as a duplicate of this bug. ***
Comment 14 Christophe Marin 2010-09-15 13:36:19 UTC
*** Bug 245917 has been marked as a duplicate of this bug. ***
Comment 15 Christophe Marin 2011-10-29 14:33:01 UTC
*** Bug 285177 has been marked as a duplicate of this bug. ***
Comment 16 Justin Zobel 2021-03-09 04:11:53 UTC
Thank you for the bug report.

As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists.

If this bug is no longer persisting or relevant please change the status to resolved.
Comment 17 Tal Levy 2021-03-11 12:42:30 UTC
I no longer use this software.