Summary: | Remove duplicate entries and feeds | ||
---|---|---|---|
Product: | [Applications] akregator | Reporter: | Anchovi Paste <kdelist> |
Component: | general | Assignee: | kdepim bugs <kdepim-bugs> |
Status: | RESOLVED WORKSFORME | ||
Severity: | wishlist | CC: | bluedzins, elfio, geekboy, latyvel, nik, scott.stubbs, sheeettin, sir_kalot |
Priority: | NOR | ||
Version: | 1.0 | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | Screenshot of akreagor showing kde userbase feed |
Description
Anchovi Paste
2005-03-04 04:12:55 UTC
see bug #93400 The mentioned bug is not exactly what this bug deals with. Quotes which should belong to this bug taken from bug #93400: "------- Additional Comment #5 From Heinrich Wendel 2005-06-08 16:51 ------- different feets may have the same articels as well, e.g. people might publish their things on planetgnome and planetfreedesktop, this should be filtered in the "All Feeds" list as well ------- Additional Comment #6 From Heinrich Wendel 2005-06-15 19:37 ------- the cleanest solution to implement that would be to check if a article already exists (hash/guid) in feed.h:appendArticles. If the article already exist take the old one and append it to the list. An article must be able to have more than one m_feed then which causes some incompatibilities that have to be considered. ------- Additional Comment #7 From Frank Osterfeld 2005-06-15 20:20 ------- @Heinrich: That would need a global archive, or at least a global article index. The current implementation is based on the assumption that every article is part of exactly one feed and that it is the feed's business to manage his articles (GUIDs are considered unique only per feed, expiry, notification inside of akregator etc.). I won't introduce additional complexity just because of a few articles showing up in multiple aggregator feeds. ------- Additional Comment #8 From Heinrich Wendel 2005-06-16 00:47 ------- Yes, you are right, currently every article can only have one feed, but the global archive could be the "All Feeds" Feed. We could then add an attribute like "duplicates" to the article in which the duplicates are saved. Actions like "mark as read" could then be performed on the article and it's duplicates. In fact I have a lot of duplicates here (at least 20%)." You would have to compare the hash and the GUID with those of every article in whole Akregator archive backend which causes high CPU and disc load. Checking with my own archive that this system merely has any success because lots of articles are not exactly the same on different feeds (you would have to work with fuzzy checksums). Furthermore GUIDs are not neccessarily the same even when dealing with the same article. Comparing the guid will do it in most of the cases. It won't cause any disk load since every article is in loaded already. For CPU load, it can't take to long to compare let's say 10.000 articles. *** This bug has been confirmed by popular vote. *** "merging" identical articles (identical means they have the same guid here) poses (at least) two problems: - article storage and expiry: Which feed is responsible for article deletion if an article has multiple parents? And who should store it? Possible solution: a global article archive instead of an per-feed archive, were feeds "release" articles when deleting/expiring them. When the last feed has released an article, it is really deleted from the archive. - slight article differences: E.g. an item on Planet KDE might slightly differ in format (markup, encoding) or even text content (abbreviated, an additional link etc.) from the item in the original feed. So the feeds would overwrite their changes vice versa all the time. *** Bug 99584 has been marked as a duplicate of this bug. *** *** Bug 121865 has been marked as a duplicate of this bug. *** *** Bug 123862 has been marked as a duplicate of this bug. *** I would also like to request that if there are duplicates I can give a certain folder a priority so it does not get deleted in this folder. E.g. there are news sites that offer different feeds: one for the features, another one for politics, another one for business etc. Now some articles can be about politics but are also feature news on the main site. So I would prefer to only see this article in the politics feed but not in the feature feed. In other words: In the features feed I only want to see news that do not concern politics or business. Feed of KDE's Userbase gives a lot of duplicates after some weeks. I ran akregator with default archive settings. See attached screenshot. Created attachment 30376 [details]
Screenshot of akreagor showing kde userbase feed
*** Bug 221583 has been marked as a duplicate of this bug. *** *** Bug 245917 has been marked as a duplicate of this bug. *** *** Bug 285177 has been marked as a duplicate of this bug. *** Thank you for the bug report. As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists. If this bug is no longer persisting or relevant please change the status to resolved. I no longer use this software. |