Bug 219516 - Podcast episodes appear multiple times
Summary: Podcast episodes appear multiple times
Status: RESOLVED FIXED
Alias: None
Product: amarok
Classification: Applications
Component: Podcast (show other bugs)
Version: 2.3-GIT
Platform: Debian testing Linux
: NOR normal
Target Milestone: ---
Assignee: Amarok Developers
URL:
Keywords:
: 223842 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-12-21 09:32 UTC by Marcel Dischinger
Modified: 2010-03-02 18:55 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcel Dischinger 2009-12-21 09:32:24 UTC
Version:           2.2.2 beta1 (using KDE 4.3.4)
OS:                Linux
Installed from:    Debian testing/unstable Packages

For some podcasts, episodes appear multiple times, i.e., each episode is added again on each update. Also, the "Limit number of episodes" option is not honored.

Example podcasts that experience this behavior:

http://www1.swr.de/podcast/xml/swr1/bw/leute.xml
http://feeds.schlaflosinmuenchen.com/weeklysim.xml
http://podcast.wdr.de/radio/wdr2kabarett.xml

I already tried deleting the podcast and re-adding it.
I experience this problem no matter whether I activated the "Limit number of episodes" option.
If I ask Amarok to download new episodes, it will start downloading the duplicated episodes.
Comment 1 rubberglove 2009-12-23 02:22:07 UTC
I see the same behaviour with a few of my subscribed podcasts in 2.2.1.90 (using kde 4.3.4 also).
Comment 2 Myriam Schweingruber 2009-12-23 10:22:29 UTC
Setting to confirmed then.
Comment 3 Bart Cerneels 2009-12-23 10:54:56 UTC
   Try to work around a database leak in podcasts.

   Because of a stupid mistake on my part there is bogus data in the podcastepisodes table. By ordering the results by desending id we should get the correct results first in the result.

   This will solve the "limit episodes not honored" part but I think there will still be duplicate episode. The only fix for that issue is a database cleanup.
   CCBUG:219516

diff --git a/src/podcasts/sql/SqlPodcastProvider.cpp b/src/podcasts/sql/SqlPodcastProvider.cpp
index 9456b93..147fcfb 100644
--- a/src/podcasts/sql/SqlPodcastProvider.cpp
+++ b/src/podcasts/sql/SqlPodcastProvider.cpp
@@ -193,7 +193,7 @@ SqlPodcastProvider::trackForUrl( const KUrl & url )
    QString command = "SELECT id, url, channel, localurl, guid, "
            "title, subtitle, sequencenumber, description, mimetype, pubdate, "
            "duration, filesize, isnew FROM podcastepisodes "
-            "WHERE guid='%1' OR url='%1' OR localurl='%1';";
+            "WHERE guid='%1' OR url='%1' OR localurl='%1' ORDER BY id DESC;";
    command = command.arg( sqlStorage->escape( url.url() ) );
    QStringList dbResult = sqlStorage->query( command );
Comment 4 vmos 2009-12-23 12:03:40 UTC
I'm on kubuntu 9.10 and I'm getting this as well in 2.2.2 for these podcasts
http://rockandrollscience.podomatic.com/rss2.xml
http://allingoodtme.podomatic.com/rss2.xml

I've tried removing the subscriptions and re-adding but it just happens again. I'm going to put my hand up and admit that my response to the database cleanup is "wha? eh?"
Comment 5 hefee 2010-01-11 21:47:52 UTC
same problem here on a debian sid (amarok 2.2.2-1). This problem only exist on this version. Till yesterday with the old debian version, i had no problem with this redownloading with episods.

I have this problem with the podcasts:
http://feeds.feedburner.com/z-pod (episode 4 and episode 35)
http://feeds.schlaflosinmuenchen.com/weeklysim.xml (some for example: SiM #518)
Comment 6 Myriam Schweingruber 2010-01-12 00:52:35 UTC
(In reply to comment #4)
> I'm on kubuntu 9.10 and I'm getting this as well in 2.2.2 for these podcasts
> http://rockandrollscience.podomatic.com/rss2.xml
> http://allingoodtme.podomatic.com/rss2.xml

I guess you meant 2.2.1.90, Amarok 2.2.2 has only been released today.
Comment 7 hefee 2010-01-12 09:26:16 UTC
I think so to that it has to be the 2.1.90 Version, but on debian this Version is labeld as 2.2.2-1 (see ChangeLog http://packages.debian.org/changelogs/pool/main/a/amarok/amarok_2.2.2-1/changelog). The about window of amarok displays: 2.2.2. The --version string (amarok --version) say:
Qt: 4.5.3
KDE: 4.3.4 (KDE 4.3.4)
Amarok: 2.2.2
But I have installed the unstable packes of kde which are KDE 4.3.3...
Comment 8 Myriam Schweingruber 2010-01-12 13:48:06 UTC
Oh, if it is labeled 2.2.2-1, then it is 2.2.2, they have packaged it a bit early then.
Comment 9 Markus Peloquin 2010-01-20 22:33:53 UTC
I can confirm this on Gentoo with amarok-2.2.2.  I have noticed this many times in the past but just ignored it until now, when I see that BlizzCast has probably a few thousand entries that take 30 seconds just to select.
http://feeds.feedburner.com/blizzardent

I believe when I debugged it last (2-3 years back), I found that the feed had bad RSS information.  Specifically I would guess it had something to do with the feed entry identifiers.  Even if that is the case, there is a bug in Amarok.
Comment 10 Myriam Schweingruber 2010-01-22 20:31:47 UTC
*** Bug 223842 has been marked as a duplicate of this bug. ***
Comment 11 Bruno Bigras 2010-01-27 00:32:23 UTC
I have this problem with Amarok 2.2-GIT. I don't think it was always like this but right now, every time I hit update a channel, I got podcasts in double.

I think I have seen this problem in the past.
Comment 12 Matías Costa 2010-02-05 22:52:48 UTC
Workaround, Beware of dragons:

alter table podcastepisodes add CONSTRAINT bug_workaround UNIQUE (url(200));

If you want to store the already downloaded and listened state:

create temporary table p(id int);
insert into p(id) SELECT min(id) FROM podcastepisodes group by url;
delete from podcastepisodes where id not in (select id from p);
Comment 13 Bart Cerneels 2010-02-08 14:28:37 UTC
Bug has been fixed in v2.2.2-732-g9069094

The duplication should have stopped and future episode should not duplicate unless there is an error in the feed.

To get rid of the duplicate episodes the only way is removing the channel and adding it again. You will lose status info. If that bothers you keep the duplicates or create a cleanup script.

Bart
Comment 14 Bart Cerneels 2010-02-10 01:57:20 UTC
commit 2d5a4c96426526bbbfbca0b84ec7c478208b44bf
Author: Bart Cerneels <bart.cerneels@kde.org>
Date:   Mon Feb 8 14:07:02 2010 +0100

    Fall back to url instead of guid for some cases.
    
    This podcast http://rockandrollscience.podomatic.com/rss2.xml uses a guid with a capital letter in it, even though it's a permalink.
    
    CCBUG:219516

diff --git a/src/podcasts/PodcastReader.cpp b/src/podcasts/PodcastReader.cpp
index 1308b70..91c3b67 100644
--- a/src/podcasts/PodcastReader.cpp
+++ b/src/podcasts/PodcastReader.cpp
@@ -1149,7 +1149,11 @@ PodcastReader::endItem()
             m_item->setDescription( description );
         }
 
-        const KUrl trackId( m_item->guid().isEmpty() ? m_item->uidUrl() : m_item->guid() );
+        bool useGuid = !m_item->guid().isEmpty();
+        if( useGuid )
+            useGuid = !m_item->guid().contains( "[A-Z]" ); //KUrl only uses lowercase
+
+        const KUrl trackId( useGuid ? m_item->uidUrl() : m_item->guid() );
         Meta::PodcastEpisodePtr episode = Meta::PodcastEpisodePtr::dynamicCast(
                                               m_podcastProvider->trackForUrl( trackId )
                                           );
Comment 15 Bart Cerneels 2010-02-10 01:57:26 UTC
commit 506acd579ec8a3d8c1bb0d1d5a9391e11489ee82
Author: Bart Cerneels <bart.cerneels@kde.org>
Date:   Mon Feb 8 13:17:17 2010 +0100

    Correctly handle non-url guid's.
    
    CCBUG:219516

diff --git a/src/podcasts/sql/SqlPodcastProvider.cpp b/src/podcasts/sql/SqlPodcastProvider.cpp
index 66042b3..33bce36 100644
--- a/src/podcasts/sql/SqlPodcastProvider.cpp
+++ b/src/podcasts/sql/SqlPodcastProvider.cpp
@@ -165,8 +165,15 @@ SqlPodcastProvider::loadPodcasts()
     emit( updated() );
 }
 
+QString
+SqlPodcastProvider::cleanUrlOrGuid( const KUrl &url )
+{
+    QString decodedUrl = QUrl::fromPercentEncoding( url.url().toUtf8() );
+    return decodedUrl;
+}
+
 bool
-SqlPodcastProvider::possiblyContainsTrack( const KUrl & url ) const
+SqlPodcastProvider::possiblyContainsTrack( const KUrl &url ) const
 {
     SqlStorage *sqlStorage = CollectionManager::instance()->sqlStorage();
     if( !sqlStorage )
@@ -174,14 +181,14 @@ SqlPodcastProvider::possiblyContainsTrack( const KUrl & url ) const
 
     QString command = "SELECT title FROM podcastepisodes WHERE guid='%1' OR url='%1' "
                       "OR localurl='%1';";
-    command = command.arg( sqlStorage->escape( url.url() ) );
+    command = command.arg( sqlStorage->escape( cleanUrlOrGuid( url ) ) );
 
     QStringList dbResult = sqlStorage->query( command );
     return !dbResult.isEmpty();
 }
 
 Meta::TrackPtr
-SqlPodcastProvider::trackForUrl( const KUrl & url )
+SqlPodcastProvider::trackForUrl( const KUrl &url )
 {
     DEBUG_BLOCK
 
@@ -189,11 +196,12 @@ SqlPodcastProvider::trackForUrl( const KUrl & url )
     if( !sqlStorage )
         return TrackPtr();
 
+
     QString command = "SELECT id, url, channel, localurl, guid, "
             "title, subtitle, sequencenumber, description, mimetype, pubdate, "
             "duration, filesize, isnew FROM podcastepisodes "
             "WHERE guid='%1' OR url='%1' OR localurl='%1' ORDER BY id DESC;";
-    command = command.arg( sqlStorage->escape( url.url() ) );
+    command = command.arg( sqlStorage->escape( cleanUrlOrGuid( url ) ) );
     QStringList dbResult = sqlStorage->query( command );
 
     if( dbResult.isEmpty() )
diff --git a/src/podcasts/sql/SqlPodcastProvider.h b/src/podcasts/sql/SqlPodcastProvider.h
index 821a346..138df15 100644
--- a/src/podcasts/sql/SqlPodcastProvider.h
+++ b/src/podcasts/sql/SqlPodcastProvider.h
@@ -43,7 +43,7 @@ class SqlPodcastProvider : public PodcastProvider
         bool possiblyContainsTrack( const KUrl &url ) const;
         Meta::TrackPtr trackForUrl( const KUrl &url );
 
-        QString prettyName() const { return i18n("Local Podcasts"); };
+        QString prettyName() const { return i18n("Local Podcasts"); }
         KIcon icon() const { return KIcon( "server-database" ); }
 
         Meta::PlaylistList playlists();
@@ -109,6 +109,11 @@ class SqlPodcastProvider : public PodcastProvider
         /** creates all the necessary tables, indexes etc. for the database */
         void createTables() const;
         void loadPodcasts();
+
+        /** return the url as a string. Removes percent encoding if it actually has a non-url guid.
+        */
+        static QString cleanUrlOrGuid( const KUrl &url );
+
         void updateDatabase( int fromVersion, int toVersion );
         void fetchImage( Meta::SqlPodcastChannelPtr channel );
Comment 16 Myriam Schweingruber 2010-02-19 11:11:18 UTC
Marcel, can you reproduce this reliably with the latest git build?
Comment 17 Myriam Schweingruber 2010-02-19 11:21:31 UTC
Just tested here with latest git build: I can't reproduce this. Marcel, please try cleaning up your database.
Comment 18 Marcel Dischinger 2010-02-19 13:55:07 UTC
I just built amarok again from git (739f67ff53572acc90e50c537aea3007e64bbf4f). I then removed the mysqle folder in ~/.kde/share/apps/amarok to start fresh.
I can reproduce this problem for the following two podcasts:
http://podcast.wdr.de/radio/wdr2kabarett.xml
http://podcast.wdr.de/radio/montalk.xml

Note that they come from the same provider.
I have no problem with any of my other podcasts.

The problem re-appeared (it was fixed for beta1) after applying Bart's patch from https://bugs.kde.org/show_bug.cgi?id=227515 to beta1.

Thanks.
Comment 19 Bruno Bigras 2010-02-19 18:35:31 UTC
I updated amarok last night or the night before, and the problem re-appeared for this podcast : http://feeds.feedburner.com/podcastmp3vdl?format=xml

but this time, even if I click multiple time on the update button, I only have the episode in double (before, each time I clicked, I had a new copy of every episode).

my other podcast seems fine.
Comment 20 Myriam Schweingruber 2010-02-19 20:38:15 UTC
Bruno, could you also please clean up your database?
Comment 21 Bruno Bigras 2010-02-20 04:48:31 UTC
(In reply to comment #20)
> Bruno, could you also please clean up your database?

Sure, I just updated Amarok and cleaned the database. No problem yet :)
Comment 22 Bruno Bigras 2010-02-20 21:04:51 UTC
I still have the problem that I described with 1 feed out of 6 -> http://feeds.feedburner.com/podcastmp3vdl?format=xml

It's the first one in my list.
Comment 23 Myriam Schweingruber 2010-02-27 11:08:46 UTC
Bruno, I can't reproduce this here with that stream at all, it also respects the limited amount of episodes. I think it is fixed. Current git build of today.
Comment 24 Marcel Dischinger 2010-03-02 11:36:27 UTC
For me, it still is not fixed. I just compiled the latest git version. I removed all amarok config files in ~/.kde and started fresh.

I only added the following podcast:
http://podcast.wdr.de/radio/wdr2kabarett.xml

Clicking "Update All" will create multiple entries every time.
As I said before, I see this happening only for podcasts from one provider, so the problem might be in the RSS feed parsing.
Comment 25 Myriam Schweingruber 2010-03-02 12:01:07 UTC
(In reply to comment #24)
> For me, it still is not fixed. I just compiled the latest git version. I
> removed all amarok config files in ~/.kde and started fresh.
> 
> I only added the following podcast:
> http://podcast.wdr.de/radio/wdr2kabarett.xml
> 
> Clicking "Update All" will create multiple entries every time.
> As I said before, I see this happening only for podcasts from one provider, so
> the problem might be in the RSS feed parsing.

I can reproduce this here, with that subscription. And it is really annoying, since if one limits the amount of visible episodes, one can end up with only one episode visible repeatedly.
I haven't seen this with any other podcast in my subscriptions.
Comment 26 Bart Cerneels 2010-03-02 18:55:50 UTC
commit e9c6052032b2016f17095aa1091018db80e1319d
Author: Bart Cerneels <bart.cerneels@kde.org>
Date:   Tue Mar 2 18:36:10 2010 +0100

    Add one more quirk for not using GUID.
    
    In the next cyclr I'll need to replace trackForUrl( KUrl ) with a PodcastProvider::episodeForGuid( QString ) which is more robust.
    BUG:219516

diff --git a/src/podcasts/PodcastReader.cpp b/src/podcasts/PodcastReader.cpp
index 35ea736..afb3ff2 100644
--- a/src/podcasts/PodcastReader.cpp
+++ b/src/podcasts/PodcastReader.cpp
@@ -1149,9 +1149,10 @@ PodcastReader::endItem()
             m_item->setDescription( description );
         }
 
-        bool useGuid = !m_item->guid().isEmpty();
-        if( useGuid )
-            useGuid = !m_item->guid().contains( "[A-Z]" ); //KUrl only uses lowercase
+        QString guid = m_item->guid();
+        bool useGuid = !guid.isEmpty() &&
+                       !guid.contains( "[A-Z]" ) && //KUrl only uses lowercase
+                       !guid.startsWith( '/' ); //appends file://
 
         const KUrl trackId( useGuid ? m_item->guid() : m_item->uidUrl() );
         Meta::PodcastEpisodePtr episode = Meta::PodcastEpisodePtr::dynamicCast(