Bug 101907 - Read articles keep getting marked as unread after feed fetching (in sourceforge feeds)
Summary: Read articles keep getting marked as unread after feed fetching (in sourcefor...
Status: RESOLVED FIXED
Alias: None
Product: akregator
Classification: Applications
Component: general (show other bugs)
Version: 1.0
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
: 102113 119712 130677 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-03-19 17:43 UTC by Jure Repinc
Modified: 2007-09-20 16:02 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Archive file for a feed (4.91 KB, text/plain)
2005-07-31 16:59 UTC, Gunnar Grim
Details
Archive files for three feeds. (3.84 KB, application/octet-stream)
2005-09-24 09:44 UTC, Gunnar Grim
Details
Patch to akgregator (relative to kde-3.4.2 release) which gives better control over article uniqueness. (48.71 KB, patch)
2005-10-05 16:20 UTC, Gary Godfrey
Details
Patch to allow finer control over what constitues a "new" article. (14.78 KB, patch)
2005-10-05 17:35 UTC, Gary Godfrey
Details
Patch to allow finer control over what constitues a "new" article. (14.78 KB, patch)
2005-10-05 17:35 UTC, Gary Godfrey
Details
Problematic feed example (22.26 KB, application/xml)
2006-04-25 11:15 UTC, Vlastimil Babka (Caster)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jure Repinc 2005-03-19 17:43:48 UTC
Version:           1.0 (using KDE 3.4.0, Gentoo)
Compiler:          gcc version 3.4.3-20050110 (Gentoo Linux 3.4.3.20050110, ssp-3.4.3.20050110-0, pie-8.7.7)
OS:                Linux (x86_64) release 2.6.11-gentoo-r3

I just installed Akregator on my Gentoo Linux machine. I'm runing KDE 3.4.0 and all is compiled in 64-bit for AMD64.

I have added a lot of feeds to Akregator and now it happens that when Akregator fetches the feeds it automatically marks all articles in these feeds as unread even if I marked them as read before.

Feeds marked as read should stay this way and only new and unread articles should show as unread.
Comment 1 Teemu Rytilahti 2005-03-19 19:37:41 UTC
Umh, sorry, can't confirm this anyway. Read your description wrong and though this is same problem I have...
Comment 2 Jure Repinc 2005-03-20 01:18:34 UTC
I also noticed that no articles or even article titles are saved if I restart Akregator.
I have in setting set to keep articles for no more then 10 days.
Comment 3 Frank Osterfeld 2005-03-20 12:01:34 UTC
Confirmed. Akregator confuses "Delete articles older than" and "Disable archiving" settings - that's because the mapping of the GUI buttons to the actual settings relies on the order the radio buttons are listed in the UI file(!).
This definitely sucks. Everytime someone changes settings_archive.ui, he has to check manually if the order in the XML is still the correct one.
I fixed this half a year ago, and now it's there again :-(
Comment 4 Frank Osterfeld 2005-03-20 12:04:53 UTC
CVS commit by osterfeld: 

fix the order of the buttongroup members. The mapping of button<->enum value is based on the order in the 
XML file! Unbelievable and error-prone.
BUG: 101907


  M +16 -16    settings_archive.ui   1.8


--- kdepim/akregator/src/settings_archive.ui  #1.7:1.8
@@ -43,12 +43,4 @@
                     </property>
                 </widget>
-                <widget class="QRadioButton" row="3" column="0" rowspan="1" colspan="2">
-                    <property name="name">
-                        <cstring>rb_DisableArchiving</cstring>
-                    </property>
-                    <property name="text">
-                        <string>Disable archiving</string>
-                    </property>
-                </widget>
                 <widget class="QRadioButton" row="1" column="0">
                     <property name="name">
@@ -59,4 +51,20 @@
                     </property>
                 </widget>
+                <widget class="QRadioButton" row="2" column="0">
+                        <property name="name">
+                                <cstring>rb_LimitArticleAge</cstring>
+                        </property>
+                        <property name="text">
+                                <string>Delete articles older than: </string>
+                        </property>
+                </widget>
+                <widget class="QRadioButton" row="3" column="0" rowspan="1" colspan="2">
+                        <property name="name">
+                                <cstring>rb_DisableArchiving</cstring>
+                        </property>
+                        <property name="text">
+                                <string>Disable archiving</string>
+                        </property>
+                </widget>
                 <widget class="KIntSpinBox" row="1" column="1">
                     <property name="name">
@@ -85,12 +93,4 @@
                     </property>
                 </widget>
-                <widget class="QRadioButton" row="2" column="0">
-                    <property name="name">
-                        <cstring>rb_LimitArticleAge</cstring>
-                    </property>
-                    <property name="text">
-                        <string>Delete articles older than: </string>
-                    </property>
-                </widget>
                 <widget class="KIntSpinBox" row="2" column="1">
                     <property name="name">
Comment 5 Frank Osterfeld 2005-03-20 12:06:25 UTC
CVS commit by osterfeld: 

backport of archive settings fix (confusing disable archive and limit article age)
CCBUG: 101907


  M +10 -2     akregator_view.cpp   1.225.2.3
  M +30 -3     articlelist.cpp   1.37.6.2
  M +16 -16    settings_archive.ui   1.7.2.1
Comment 6 Teemu Rytilahti 2005-03-21 18:47:06 UTC
*** Bug 102113 has been marked as a duplicate of this bug. ***
Comment 7 Gunnar Grim 2005-07-30 19:21:19 UTC
I don't think it is simply a radio button confusion, I have selected "Disable archiving" to get ArchiveMode=limitArticleNumber (from .kde/share/config/akregatorrc). Still, some articles keep getting marked as unread. It is always the same articles, seemingly randomly selected. I'm using KDE 3.4.2 and akregator 1.0.
Comment 8 Adrien Beau 2005-07-31 10:12:29 UTC
Why don't you use the Akregator version that ships with KDE 3.4.2? At the very least, you should upgrade to Akregator 1.0.2, but even that version is older than the one in KDE 3.4.2.
Comment 9 Gunnar Grim 2005-07-31 10:34:00 UTC
I will try it and see if it helps. The reason I added the above comment is that the comments from Frank hinted that the bug was only related to the incorrect radio buttons.
Comment 10 Gunnar Grim 2005-07-31 11:32:40 UTC
OK, now I'm using 1.1.2. The archive settings panel is now fixed but the "unread" problem remains.
Comment 11 Frank Osterfeld 2005-07-31 11:38:00 UTC
How often does this happen? All the time or only for specific feeds and/or a very few articles?
Can you list feed URLs and post which articles update all the time (it would be enough to list 2-3 articles that change and some that don't)?
Comment 12 Gunnar Grim 2005-07-31 11:52:48 UTC
Sure. Here's one that contains only two articles:

http://sourceforge.net/export/rss2_projfiles.php?group_id=136710 

The article about v0.2 being released keeps changing to unread while the one about v0,1 remains read.

The problem is not easily reproduced. It happens sometimes when I press the "Fetch All Feeds" button, sometimes when feeds are fetched in the background. Also, if I fetch feeds with the intention of verifying the bug, there is no problem. If I just work with other stuff though, after a while I get a number of unread articles.

If I find a consistent way of causing the problem I'll post a new comment.
Comment 13 Frank Osterfeld 2005-07-31 13:17:14 UTC
How often does it happen in total? For (nearly) every feed? For one feed out of 50? It's possible that the feed changes.
Comment 14 Gunnar Grim 2005-07-31 14:31:02 UTC
First, to answer your questions:

How often does it happen in total? - Several times a day.
For (nearly) every feed? - Yes.

Second, and mor importantly:

Since I upgraded it has only happened once, and that was right after I started the new version. I'm thinking (and hoping) that the following has happened:
Just before I quit version 1.0, I marked all feeds as read. Then I started 1.1.2 and the same articles as always were marked as unread. Now, it is possible that just before I quit 1.0 it performed a fetch and marked the articles as unread. Then when I started 1.1.2, the articles were already marked as read, making it look like 1.1.2 had done it. Now a few hours has passed and no articles has been marked as unread. So hopefully the bug is actually fixed with 1.1.2. Of course if it happens again I'll post!
Comment 15 Gunnar Grim 2005-07-31 16:57:50 UTC
Sadly, it just happened again. I'll attach one of the files in .kde/share/apps/akregator/Archive.
Comment 16 Gunnar Grim 2005-07-31 16:59:29 UTC
Created attachment 12015 [details]
Archive file for a feed
Comment 17 Gunnar Grim 2005-08-09 20:41:06 UTC
I have now added a couple of other feeds and it seems that only the feeds from sourceforge are affected by this problem. The exact same feeds in Thunderbird does not behave like this though so I don't think it is a sourceforge problem.
Comment 18 Gunnar Grim 2005-09-22 09:51:48 UTC
Why is this issue marked as resolved? I'm using akregator 1.1.2 and it is certainly not fixed in that version. Is there a newer version somewhere?
Comment 19 Thiago Macieira 2005-09-22 13:02:46 UTC
Both the KDE 3.4.2 and 3.5 beta1 versions should have the fix.
Comment 20 Gunnar Grim 2005-09-22 13:35:31 UTC
I'm using KDE 3.4.2. I think perhaps there is some misunderstandig here. There used to be a bug in the setting dialog, with radio buttons in the wrong order, and that one has been fixed. This fix may affect some causes for the problem but not all. As I said in comment #17, the problem remains for SourceForge feeds.
Comment 21 Frank Osterfeld 2005-09-22 17:52:02 UTC
Could you provide some example feeds (URLs) where this happens for you? The more the better.
Comment 22 Gunnar Grim 2005-09-22 19:04:39 UTC
Already have. See comment #16.
Comment 23 Frank Osterfeld 2005-09-22 19:11:30 UTC
That's an archive, not a feed URL :) (I probably could find the feed URL by browsing the linked pages, but well, you know, developers are lazy sometimes. And if you have more than one feed URL, this could help, too)
So far I have tried the ksvn feed from comment #12. Any others? Or just sf feeds in general?
Comment 24 Gunnar Grim 2005-09-22 19:33:39 UTC
Sorry, should have said Comment #12. Here are a few more:

http://sourceforge.net/export/rss2_projfiles.php?group_id=36382
http://sourceforge.net/export/rss2_projfiles.php?group_id=64348

Only SF feeds seem to cause this problem now that the dialog bug was fixed, but I'm not sure about that since I only subscribe to a few others.

HTH
Comment 25 Frank Osterfeld 2005-09-22 19:50:12 UTC
Thanks, I added the feeds to my list, let's see if I can reproduce it.

Can you trigger the status change back to unread? I.e. does it happen if you fetch the feed manually in a short time (try a few times)? Or only "several times a day" as you wrote, but not always? 

It would be useful if you could do the following

1) mark feed as read
2) copy the archive to archive-1
3) wait until the items are reset to new/unread
4) mark feed as read
5) copy the archive to archive-2
6) post archive-1 and archive-2 here

I discussed a similar problem with someone on IRC 1-2 months ago, it seems that the check for changes (hash function) does not work properly on some machines. IIRC he tried to use the archive on different machines and synced them using unison, so the hash calculation might have different results on different machines. That shouldn't matter in your case though...
Comment 26 Frank Osterfeld 2005-09-22 20:00:26 UTC
Another question: Are all the newest n articles in the feed (i.e. the ones that are currently in RSS file) affected? Or only some random ones, without a certain pattern?
Comment 27 Frank Osterfeld 2005-09-23 18:25:09 UTC
The sourceforge feeds contain download statistics. So every time someone downloads the file, the feed changes. I guess this is the reason for the behaviour. Could you check?
Comment 28 Gunnar Grim 2005-09-24 09:44:43 UTC
Created attachment 12684 [details]
Archive files for three feeds.

The attached file contains archives for three feeds in three different states:
State a - After marking the feeds as read.
State b - After akregator has incorrectly flagged several items as unread.
State c - After again marking the feeds as read.

Diffing the files shows that the description for the items that are marked as
unread is indeed changed, due to the download statistics. The publishing date,
however, has not changed.

I guess now it is pretty clear what the problem is. Is it really necessary to
include the description in the hash? If it is for some feeds, could it perhaps
be made optional for a feed so that you can switch it off for SF feeds and
others that use the description in a similar way?
Comment 29 Gary Godfrey 2005-10-05 14:29:20 UTC
I've been seeing a similar problem more recently with slate (http://www.slate.com/rss).  It looks like the description field has links to doubleclick and such that changes every time you fetch.  This changes the hash value of the article.  I've attached a patch which allows configuration of which fields to use in the feed for calculating the patch (note this is bypassed if the feed has the "hash" property).
Comment 30 Gary Godfrey 2005-10-05 16:20:36 UTC
Created attachment 12864 [details]
Patch to akgregator (relative to kde-3.4.2 release) which gives better control over article uniqueness.
Comment 31 Gary Godfrey 2005-10-05 17:01:45 UTC
Comment on attachment 12864 [details]
Patch to akgregator (relative to kde-3.4.2 release) which gives better control over article uniqueness.

Whoops - left a few files in the patch that I shouldn't have....
Comment 32 Gary Godfrey 2005-10-05 17:35:22 UTC
Created attachment 12867 [details]
Patch to allow finer control over what constitues a "new" article.

(one more time - hopefully it's right this time).
Comment 33 Gary Godfrey 2005-10-05 17:35:23 UTC
Created attachment 12868 [details]
Patch to allow finer control over what constitues a "new" article.

(one more time - hopefully it's right this time).
Comment 34 Frank Osterfeld 2005-10-08 13:33:42 UTC
Thanks for the patch, but we are in feature and string freeze for KDE 3.5 now, so we can't add anything to the GUI anymore. For KDE4 we can consider this, but I think we can boil it down to "Don't reset to new when article content changes" or something like that. I mean, what else than the description is subject to change? (well, titles, but only rarely). I refrain from exposing technical details like this to the user.
Comment 35 shsschlarb-tux 2005-10-24 08:56:45 UTC
In my case all the articles and news (5 channels with about 77 items) are set to unread on startup of Akregator and it is impossible that the content changed, because there are some feeds that I wrote myself. 
Comment 36 Michael 2005-11-07 12:15:11 UTC
Just adding a 'me too', Akregator 1.1.3 with KDE 3.4.3, under Gentoo.
In http://www.php.net/news.rss and http://www.mozilla.org/news.rdf the same articles are marked unread after every fetch.
Comment 37 Gunnar Grim 2006-01-19 18:51:18 UTC
Just installed 3.5 and it still has the problem. After reading the comments above I get the impression that Frank is of the oppinion that this is not a bug and that a new feature, with UI changes, is required. I'm not sure I agree. If the publishing date hasn't changed then isn't that enough? I don't know if that is what Thunderbird uses but it somehow manages to get it right.

Wouldn't it be possible to fix this, i e use the publishing date, and then perhaps in KDE4 add an option where you can specify for each feed that it should check other fields?

Since the sourceforge file release feeds put download statistics in their descriptions this is really a problem.
Comment 38 Frank Osterfeld 2006-01-25 23:36:20 UTC
*** Bug 119712 has been marked as a duplicate of this bug. ***
Comment 39 Derek Broughton 2006-02-03 19:18:47 UTC
I too see this problem, and it can't possibly 'boil ... down to "Don't reset to new when article content changes"'.  I've just tried all the different archive settings, quit kontact, start kontact, fetch feeds for each setting, and every time all articles come up as "New".  Not Unread, but New.  The only difference with any of the archival settings is that there are only 61 articles if I expire anything older than 10 days and 115 with any other setting.

KDE 3.5, akregator 1.2.1, kontact 1.2
Comment 40 Doug McMahon 2006-02-05 13:03:13 UTC
I am adding a me too for this bug with Kubuntu and KDE 3.5, akregator 1.2.1, kontact 1.2. I thought I would just add some more feeds that display this problem for testing purposes (these are all the feeds I read unfortunately!)

http://www.lifehacker.com/index.xml

http://www.gizmodo.com/index.xml

http://www.kde.org/dot/kde-apps-content.rdf

This is my first Bugzilla post.  Having looked at previous posts I assume this kind of post is allowed. If this kind of post is thought of as annoying/useless etc please let me know so I won't do it again!
Comment 41 Shlomi Fish 2006-04-23 14:51:42 UTC
This happens to me with an item in the following feed:

http://www.oreillynet.com/pub/feed/8?format=rss2

This item is the "Advanced MySQL Replication Techniques".

Comment 42 Vlastimil Babka (Caster) 2006-04-25 11:12:18 UTC
Happens to me too, for example with feed http://gentoo-portage.com/RSS/Newest/
I've downloaded snapshot of the feed and uploaded it to http://www.kabel1.cz/~caster/Newest.xml (will also add it as attachment)
The articles I see marked as New (red) after every feed fetch are:

xft-7.0
asterisk-1.0.9-r4
baselayout-1.12.0_pre18-r1
vdr2jpeg-0.0.8b

When I open the raw xml in Firefox, I can see two different articles (for example) for baselayout:

<item>
<title>baselayout-1.11.15-r1</title>
<link>http://gentoo-portage.com/sys-apps/baselayout</link>
<description>Filesystem baselayout and init scripts</description>
<guid>http://gentoo-portage.com/sys-apps/baselayout</guid>
<pubDate>Mon, 24 Apr 2006 15:26:37 +0000</pubDate>
<source url="http://gentoo-portage.com/RSS/Newest/">Newest Ebuilds</source>
</item>
−
<item>
<title>baselayout-1.12.0_pre18-r1</title>
<link>http://gentoo-portage.com/sys-apps/baselayout</link>
<description>Filesystem baselayout and init scripts</description>
<guid>http://gentoo-portage.com/sys-apps/baselayout</guid>
<pubDate>Mon, 24 Apr 2006 15:26:38 +0000</pubDate>
<source url="http://gentoo-portage.com/RSS/Newest/">Newest Ebuilds</source>
</item>

But Akregator shows only last one of the versions! Seems it has a problem distinguishing these articles as unique, but the titles are different! The same thing applies for other problematic articles - all have more than one version and Akregator shows only the last one.

My version is 1.1.3, KDE 3.4.3, on Gentoo. 
Comment 43 Vlastimil Babka (Caster) 2006-04-25 11:15:41 UTC
Created attachment 15768 [details]
Problematic feed example

Snapshot of http://gentoo-portage.com/RSS/Newest/
Comment 44 Vlastimil Babka (Caster) 2006-04-25 13:25:11 UTC
After reading some other bugs here, I realized that Akregator uses <guid> to distinguish articles if the tag is present. I see the articles in my example have same guid, so it's probably valid to treat them as one, and the feed should be fixed. Same goes for link from comment 41 - there are two items with same guid but different title (one of them is a typo fix). But the behaviour of marking such article as new on fetch is still wrong, I think.
Comment 45 philou 2006-05-15 16:52:23 UTC
I have a similar problem with Google News RSS feeds:

Everytime the feed refreshes, then all messages are marked as "new" again along with the new ones.

Problematic feed here: 

http://news.google.com/news?ned=us&topic=h&output=rss

Thanks

Akregator 1.2.2 (KDE  3.5.2) Debian Sid
Comment 46 Frank Osterfeld 2006-05-20 10:10:26 UTC
Caster: The problem with multiple items having the same guid is actually another problem and deserves its own bug report. (well, guids are meant to be unique _identifiers_ of items, so if a feed contains the same guid for different items, the feed needs to be fixed, not akregator).
Comment 47 Frank Osterfeld 2006-05-20 11:12:27 UTC
SVN commit 542756 by osterfeld:

Do not reset status of modified articles to "New". It's just too much noise for a tiny bit of signal.
BUG: 101907


 M  +4 -1      ChangeLog  
 M  +5 -3      src/feed.cpp  


--- branches/KDE/3.5/kdepim/akregator/ChangeLog #542755:542756
@@ -6,14 +6,17 @@
 -----------------------------
 
 New features:
+
  2006/05/01 add author information to article header (in the article pane only) -fo
 
 Bug fixes:
+
+ 2006/05/20 Don't reset article status to New when the article changed (#101907) -fo
  2006/05/10 Always show feed logos; load them on startup, not on first fetch -fo
  2006/05/10 fix crash when using "Load the full website when reading articles" and an error (e.g. 404) 
             is returned (#126812) -fo 
  2006/04/29 Do not crash on startup when Combined View mode is activated (Happened only when experimental tagging is 
-activated) -fo
+            activated) -fo
  2006/03/22 Prevent "Akregator is running" messages on startup (reset PID to -1 when closing akregator) -fo
 
 Changes after 1.2.1:
--- branches/KDE/3.5/kdepim/akregator/src/feed.cpp #542755:542756
@@ -474,12 +474,14 @@
             if (!mya.guidIsHash() && mya.hash() != old.hash() && !old.isDeleted())
             {
                 mya.setKeep(old.keep());
+                int oldstatus = old.status();
                 old.setStatus(Article::Read);
+
                 d->articles.remove(old.guid());
                 appendArticle(mya);
-                // reset status to New
-                if (!mya.isDeleted() && !markImmediatelyAsRead())
-                    mya.setStatus(Article::New);
+
+                mya.setStatus(oldstatus);
+
                 d->updatedArticlesNotify.append(mya);
                 changed = true;
             }
Comment 48 radfoj 2006-06-11 19:11:24 UTC
Its too technical here to understand many of post above. But I want reply to comment #47, which lead to changes in newest version of akregator (3.5.3):

Do not reset status of modified articles to "New".

Does this mean, that for e.g. http://www.kde.org/dot/kde-apps-content.rdf if new version of some application is released, akregator will not let me know about it? If so, I have many other feeds for which is crucial to be somehow informed, if article is informed.
Comment 49 radfoj 2006-06-11 19:17:46 UTC
Sorry, I am little nervous with new akregator behaviour, so I did some mistakes. I wanted to say, that its important for me in many cases to be somehow informed, after article is UPDATED. Thanks
Comment 50 Vlastimil Babka (Caster) 2006-06-19 23:53:22 UTC
Exactly, I've just observed it with http://www.kde.org/dot/kde-apps-content.rdf - when you have an older version of app in your akregator archive, new version won't show up. It's the feed fault for publishing new article with same <guid> and bug should be filled for the feed.

Before KDE 3.5.3, I think there were 3 possibilities
a) feed contains two articles with same <guid> simultaneously (they are both present at one time), this resulted in those articles showing only as one, and updating status to NEW after EACH FETCH (until they were gone from the feed, replaced with newer articles). Now this was very annoying, because of the each fetch. Yeah it would need change of GUI/strings but I've read that KDE 3.5 is no longer strictly freezed because 4.0 is too far away, so it would be possible?
b) feed contains article with same <guid> as some older one, not simultaneously with the new one. If the older one is archived in akregator, the new one overwrites it and sets NEW. This is what happens with http://www.kde.org/dot/kde-apps-content.rdf for example.
c) feed is crazy and changes description/date/whatever each fetch, while keeping the same <guid>, annoying as a). This is case of google rss feeds? (not sure about that)

3.5.3 for sure made case b) not report such updated articles as NEW, while I think this bug and the real problem was about case a). 3.5.3 probably fixed all a) b) c) but at the cost of losing news from b). It would be possible to fix a) only though, probably not possible to fix c) without b).

While it's clearly problem of the feeds to be broken, it's hard to get all of them fixed, especially if you can't know you missed an article and the feed is broken. What would be best is one global option for marking updated articles as NEW, and per-feed option overriding the global one. 
Comment 51 Vlastimil Babka (Caster) 2006-06-19 23:55:25 UTC
Sorry, in the previous comment I've put the last sentence in the wrong place. The sentence "Yeah it would need change of GUI/strings but I've read that KDE 3.5 is no longer strictly freezed because 4.0 is too far away, so it would be possible?" belongs to the end :)
Comment 52 Stefan Borggraefe 2006-07-12 10:17:24 UTC
*** Bug 130677 has been marked as a duplicate of this bug. ***
Comment 53 Derek Broughton 2007-09-20 16:02:20 UTC
I don't understand - there's no sign that this has ever been fixed, even though it's marked RESOLVED and FIXED.

Every feed I access has every post marked as NEW the first time it is fetched after kontact startup.  It can't be anything to do with the feed itself changing, as I can fetch all feeds, mark everything as read, exit kontact, and start kontact again and all feeds will be refetched - and will be NEW.