Bug 231062 - Dates in podcasts are not parsed correctly
Summary: Dates in podcasts are not parsed correctly
Status: RESOLVED FIXED
Alias: None
Product: amarok
Classification: Applications
Component: Podcast (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Amarok Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-17 06:46 UTC by Thomas Tanghus
Modified: 2010-03-23 23:34 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In: 2.3.1


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Tanghus 2010-03-17 06:46:20 UTC
Version:           2.3.0 (using KDE 4.4.1)
OS:                Linux
Installed from:    Ubuntu Packages

yet another problem with podcasts from The Danish Broadcasting Company...

I have one feed:

http://podcast.dr.dk/p4/rssfeed/sproghjoernet.xml

which shows the problem. Their feed is very long going back to like 2007 or 2008. From November 2nd 2009 the month part of pubDate is no longer capitalized and Amarok is unable to parse it. I tested it in Akregator and it shows the same result. I guess the two programs uses the same parser? (which?)
I know this is specified in RFC822 but it would be such a small addition to the parser logic to lowercase months and weekdays in the test that I think it would be worth it.

Thanks for a brand new 2.3.0 :-)
Comment 1 Thomas Tanghus 2010-03-17 06:49:32 UTC
Forgot to mention that a ran:
Amarok.Collection.query("SELECT e.title,e.pubdate FROM podcastepisodes e LEFT JOIN podcastchannels c ON e.channel=c.id WHERE c.url='http://podcast.dr.dk/p4/rssfeed/sproghjoernet.xml';");

and the result was:

Sproghjørnet - Uge 08 2010,,Sproghjørnet - Uge 07 2010,,Sproghjørnet - Uge 06 2010,,Sproghjørnet - Uge 09 2010,,Sproghjørnet - Uge 10 2010,,Sproghjørnet - Uge 11 2010,,Sproghjørnet - Uge 05 2010,,Sproghjørnet - Uge 04 2010,,Sproghjørnet - Uge 03 2010,,Sproghjørnet - Uge 02 2010,,Sproghjørnet - Uge 01 2010,,Sproghjørnet - Uge 53 2009,,Sproghjørnet - Uge 52 2009,,Sproghjørnet - Uge 51 2009,,Sproghjørnet - Uge 50 2009,,Sproghjørnet - Uge 49 2009,,Sproghjørnet - Uge 48 2009,,Sproghjørnet - Uge 47 2009,,Sproghjørnet - Uge 46 2009,,Sproghjørnet - Uge 45 2009,,Sproghjørnet - Uge 44 2009,2009-10-29T13:32:35,Sproghjørnet - Uge 43 2009,2009-10-29T13:32:50,Sproghjørnet - Uge 42 2009,2009-10-29T13:32:51,Sproghjørnet - Uge 41 2009,2009-10-29T13:32:52,Sproghjørnet - Uge 40 2009,2009-09-29T15:07:00,Sproghjørnet - Uge 39 2009,2009-09-24T15:55:43,Sproghjørnet - Uge 38 2009,,Sproghjørnet - Uge 37 2009,,Sproghjørnet - Uge 36 2009,2009-08-31T14:58:43,Sproghjørnet - Uge 35 2009,2009-08-26T08:48:43,Sproghjørnet - Uge 34 2009,2009-08-17T14:29:43,Sproghjørnet - Uge 33 2009,2009-08-12T12:43:43,Sproghjørnet - Uge 32 2009,2009-08-03T14:18:43,Sproghjørnet - Uge 31 2009,2009-08-03T14:13:43,Sproghjørnet - Uge 30 2009,2009-07-20T13:18:43,Sproghjørnet - Uge 29 2009,2009-07-13T11:18:43,Sproghjørnet - Uge 28 2009,2009-07-06T15:10:43,Sproghjørnet - Uge 27 2009,2009-06-26T10:55:43,Sproghjørnet - Uge 26 2009,2009-06-26T10:45:43,Sproghjørnet - Uge 25 2009,2009-06-19T08:59:43,Sproghjørnet - Uge 24 2009,2009-06-10T12:39:43,Sproghjørnet - Uge 23 2009,2009-06-03T15:08:43,Sproghjørnet - Uge 22 2009,2009-05-25T15:13:43,Sproghjørnet - Uge 21 2009,2009-05-19T14:40:43,Sproghjørnet - Uge 20 2009,2009-05-11T13:02:43,Sproghjørnet - Uge 19 2009,2009-05-04T15:02:43,Sproghjørnet - Uge 18 2009,2009-04-28T15:42:43,Sproghjørnet - Uge 17 2009,2009-04-22T09:58:43,Sproghjørnet - Uge 15 og 16 2009,2009-04-15T09:35:43,Sproghjørnet - Uge 14 2009,2009-03-30T15:55:43,Sproghjørnet - Uge 13 2009,2009-03-30T16:00:43,Sproghjørnet - Uge 12 2009,2009-03-16T11:25:43,Sproghjørnet - Uge 11 2009,2009-03-10T14:31:43,Sproghjørnet - Uge 10 2009,2009-03-03T14:41:43,Sproghjørnet - Uge 09 2009,2009-02-23T14:27:43,Sproghjørnet - Uge 08 2009,2009-02-17T13:47:43,Sproghjørnet - Uge 07 2009,2009-02-09T13:51:43,Sproghjørnet - Uge 06 2009,2009-02-02T13:55:43,Sproghjørnet - Uge 05 2009,2009-01-26T10:16:43,Sproghjørnet - Uge 04 2009,2009-01-19T15:16:43,Sproghjørnet - Uge 03 2009,2009-01-12T16:02:43,Sproghjørnet - Uge 02 2009,2009-01-05T16:02:43,Sproghjørnet - Uge 01 2009,2009-01-02T10:32:43,Sproghjørnet - Uge 52 2008,2008-12-23T11:49:43,Sproghjørnet - Uge 51 2008,2008-12-15T14:39:43,Sproghjørnet - Uge 50 2008,2008-12-08T11:37:43,Sproghjørnet - Uge 49 2008,2008-12-01T14:32:43,Sproghjørnet - Uge 48 2008,2008-11-24T13:00:43,Sproghjørnet - Uge 47 2008,2008-11-17T19:25:43,Sproghjørnet - Uge 46 2008,2008-11-11T15:10:43,Sproghjørnet - Uge 45 2008,2008-10-30T10:52:43,Sproghjørnet - Uge 44 2008,2008-10-23T14:27:14,Sproghjørnet - Uge 43 2008,2008-10-16T14:27:24,Sproghjørnet - Uge 42 2008,2008-10-09T14:27:26,Sproghjørnet - Uge 41 2008,2008-10-02T14:27:27,Sproghjørnet - Uge 40 2008,2008-09-25T09:06:43,Sproghjørnet - Uge 39 2008,2008-09-22T11:43:43,Sproghjørnet - Uge 38 2008,2008-09-16T08:50:43,Sproghjørnet - Uge 37 2008,2008-09-08T12:00:43,Sproghjørnet - Uge 36 2008,2008-09-01T11:37:43,Sproghjørnet - Uge 35 2008,2008-08-26T15:42:43,Sproghjørnet - Uge 34 2008,2008-08-26T15:37:43,Sproghjørnet - Uge 33 2008,2008-08-18T16:38:43,Sproghjørnet - Uge 32 2008,2008-08-18T14:33:43,Sproghjørnet - Uge 31 2008,2008-07-28T10:50:43,Sproghjørnet - Uge 30 2008,2008-07-21T15:19:43,Sproghjørnet - Uge 29 2008,2008-07-18T10:54:43,Sproghjørnet - Uge 28 2008,2008-07-17T15:18:43,Sproghjørnet - Uge 27 2008,2008-07-17T15:15:43,Sproghjørnet - Uge 26 2008,2008-06-23T16:35:43,Sproghjørnet - Uge 25 2008,2008-06-17T13:41:43,Sproghjørnet - Uge 24 2008,2008-06-09T07:41:43,Sproghjørnet - Uge 23 2008,2008-06-05T12:41:43,Sproghjørnet - Uge 22 2008,2008-05-30T06:04:21,Sproghjørnet - Uge 21 2008,2008-05-19T11:36:21,Sproghjørnet - Uge 20 2008,2008-05-05T15:06:21,Sproghjørnet - Uge 19 2008,2008-05-05T11:30:21,Sproghjørnett - Uge 18 2008,2008-04-28T15:05:21,Sproghjørnet - Uge 17 2008,2008-04-21T10:13:21,Sproghjørnet - Uge 16 2008,2008-04-07T09:45:21,Sproghjørnet - Uge 15 2008,2008-04-07T10:03:21,Sproghjørnet - Uge 14 2008,2008-03-31T11:20:21,Sproghjørnet - Uge 13 2008,2008-03-17T11:00:21,Sproghjørnet - Uge 12 2008,2008-03-17T10:56:21,Sproghjørnet - Uge 11 2008,2008-03-11T11:46:21,Sproghjørnet - Uge 10 2008,2008-03-01T16:30:21,Sproghjørnet - Uge 09 2008,2008-02-25T09:16:21,Sproghjørnet - Uge 08 2008,2008-02-18T09:56:21,Sproghjørnet - Uge 07 2008,2008-02-13T10:36:21,Sproghjørnet - Uge 06 2008,2008-02-05T14:55:21,Sproghjørnet - Uge 05 2008,2008-01-30T09:40:21,Sproghjørnet - Uge 04 2008,2008-01-22T14:24:21,Sproghjørnet - Uge 03 2008,2008-01-14T10:45:21,Sproghjørnet - Uge 02 2008,2008-01-07T10:45:21,Sproghjørnet - Uge 01 2008,2008-01-04T10:45:21,Sproghjørnet - Uge 52 2007,2008-01-04T10:37:21,Sproghjørnet - Uge 51 2007,2007-12-17T13:39:21,Sproghjørnet - Uge 50 2007,2007-12-10T15:30:21,Sproghjørnet - Uge 49 2007,2007-12-04T12:06:21,Sproghjørnet - Uge 48 2007,2007-11-27T09:54:21,Sproghjørnet - Uge 47 2007,2007-11-19T11:35:21,Sproghjørnet - Uge 46 2007,2007-11-12T11:38:21,Sproghjørnet - Uge 45 2007,2007-11-12T11:34:21,Sproghjørnet - Uge 44 2007,2007-10-29T11:53:21,Sproghjørnet - Uge 43 2007,2007-10-22T12:18:21,Sproghjørnet - Uge 42 2007,2007-10-16T08:14:21,Sproghjørnet - Uge 41,2007-10-08T08:14:26,Sproghjørnet - Uge 40,2007-10-01T08:14:28,Sproghjørnet - Uge 39,2007-09-24T08:14:29,Sproghjørnet - Uge 38,2007-09-18T08:14:31,Sproghjørnet - Uge 37,2007-09-10T08:14:32,Sproghjørnet - Uge 36,2007-09-01T08:14:33,Sproghjørnet - Uge 35,2007-08-27T08:14:34,Sproghjørnet - Uge 34,2007-08-20T08:18:05,Sproghjørnet - Uge 33,2007-08-14T09:33:19,Sproghjørnet - mon 09-07-07,2007-07-09T11:35:19,Sproghjørnet - wed 02-07-07,2007-07-02T10:42:19,Sproghjørnet - wed 25-06-07,2007-06-25T12:32:19,Sproghjørnet - wed 19-06-07,2007-06-19T12:22:19,Sproghjørnet - wed 12-06-07,2007-06-12T09:12:19,Sproghjørnet - wed 06-06-07,2009-06-05T11:09:53,Sproghjørnet - wed 29-05-07,2007-05-29T21:48:19,Sproghjørnet - wed 21-05-07,2007-05-21T11:11:19,Sproghjørnet - wed 16-05-07,2007-05-16T08:38:19,Sproghjørnet - wed 07-05-07,2007-05-07T11:54:19,Sproghjørnet - wed 01-05-07,2007-05-01T11:40:19,Sproghjørnet - wed 24-04-07,2007-04-24T09:00:19,Sproghjørnet - wed 16-04-07,2007-04-16T10:43:19,Sproghjørnet - wed 10-04-07,2007-04-10T11:21:19,Sproghjørnet - wed 02-04-07,2007-04-02T13:16:19,Sproghjørnet - wed 28-03-07,2007-03-28T12:06:19,Sproghjørnet - mon 19-03-07,2007-03-19T12:11:19,Sproghjørnet - mon 12-03-07,2007-03-12T11:09:19,Sproghjørnet - mon 05-03-07,2007-03-05T11:46:19,Sproghjørnet - mon 26-02-07,2007-02-26T11:29:19,Sproghjørnet - mon 19-02-07,2007-02-19T13:19:19,Sproghjørnet - mon 12-02-07,2007-02-12T11:09:19,Sproghjørnet - mon 12-02-07,2007-02-12T10:55:19,Sproghjørnet - mon 05-02-07,2007-02-05T14:19:19,Sproghjørnet - mon 29-01-07,2007-01-29T13:16:19,Sproghjørnet - tue 23-01-07,2007-01-23T16:26:19,Sproghjørnet - mon 15-01-07,2007-01-05T15:29:19,Sproghjørnet - wed 09-01-07,2007-01-09T20:26:19,Sproghjørnet - wed 03-01-07,2007-01-03T13:36:19,Sproghjørnet - tue 27-12-06,2006-12-27T10:46:19,Sproghjørnet - tue 20-12-06,2006-12-20T14:27:19,Sproghjørnet - tue 12-12-06,2006-12-12T21:17:19,Sproghjørnet - mon 04-12-06,2006-12-04T11:35:19,Sproghjørnet - mon 27-11-06,2006-11-27T13:30:19,Sproghjørnet - mon 20-11-06,2006-11-20T15:23:19,Sproghjørnet - fri 10-11-06,2006-11-10T13:16:19,Sproghjørnet - wed 06-11-06,2006-11-06T21:30:19,Sproghjørnet - wed 01-11-06,2006-11-01T12:34:19,Sproghjørnet - mon 23-10-06,2009-06-05T11:11:34,Sproghjørnet - tue 17-10-06,2009-06-05T11:11:38,Sproghjørnet - mon 09-10-06,2009-06-05T11:11:41,Sproghjørnet - mon 02-10-06,2006-09-18T13:16:19,Sproghjørnet - mon 25-09-06,2006-09-18T15:09:19,Sproghjørnet - mon 18-09-06,2006-09-18T14:07:19,Sproghjørnet - fri 15-09-06,2006-09-15T20:49:19,Sproghjørnet - Sat 09-09-06,2006-09-09T22:40:19,Sproghjørnet - Mon 28-08-06,2006-08-29T15:20:19,Sproghjørnet - Mon 21-08-06,2009-06-05T11:11:59
Comment 2 Thomas Tanghus 2010-03-17 07:18:09 UTC
I also forgot to mention that I of course also filed a bug report at The Danish Broadcasting Company :-)
Comment 3 Bart Cerneels 2010-03-23 19:45:34 UTC
You were right, we do use the same parsing as Akregator. The RFC 2822 parsing from kdelibs actually. So this feed is going to break for every KDE app.
It could be reported to kdelibs I guess.

This a case of a broken feed that might be "easy" to handle. Only question: should we. We can handle some common mistakes in bad RSS feeds. But publishers should respect the standard practice. As you can tell from [1] this one clearly has many other issues.

Thanks for reporting this to the publisher.

[1] http://www.nobodylikesonions.com/feedcheck/?feed=http://podcast.dr.dk/p4/rssfeed/sproghjoernet.xml
Comment 4 Thomas Tanghus 2010-03-23 20:27:32 UTC
I absolute agree with your point that it is the publishers responsibility to deliver valid feed and I have already reported quite a few errors to them.
But as KHTML also has gotten more tolerant to invalid markup maybe it would be a good idea for the developers responsible for the parser to discuss whether it should check for minor errors and correct them.
Maybe this should be marked as a WISH?

BTW: Didn't know that feed checker. I used:

http://www.feedvalidator.org/check.cgi?url=http://podcast.dr.dk/p4/rssfeed/sproghjoernet.xml
Comment 5 Bart Cerneels 2010-03-23 23:12:30 UTC
commit 11d9d1e648a1c345c160ca552773ceaad1f4160e
Author: Bart Cerneels <bart.cerneels@kde.org>
Date:   Tue Mar 23 22:30:26 2010 +0100

    Work around KDateTime::fromString being to strict.
    
    This could be concidered a kdelibs bug because akregator has the same problem.
    
    My motivation for fixing this corner case in amarok is because it probably works in iTunes which is what most podcasts are tested with.
    BUG:231062

diff --git a/ChangeLog b/ChangeLog
index 750e573..feca4d1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -44,6 +44,8 @@ VERSION 2.3.1-Beta 1
        some MySQL versions. (BR 225052)
 
   BUGFIXES:
+     * Fixed a broken podcast feed that had a minor compliance issue in date
+       format. (BR 231062)
      * Fixed "files" bookmarks not storing the actual path shown in the file 
        browser (BR 231437)
      * Fixed incorrectly displayed cover images for albums with the same name,
diff --git a/src/podcasts/PodcastReader.cpp b/src/podcasts/PodcastReader.cpp
index 0d72824..f609222 100644
--- a/src/podcasts/PodcastReader.cpp
+++ b/src/podcasts/PodcastReader.cpp
@@ -1607,10 +1607,21 @@ PodcastReader::parsePubDate( const QString &dateString )
     debug() << "Parsing pubdate: " << parseInput;
 
     QRegExp rfcDateDayRegex( "^[A-Z]{1}[a-z]{2}\\s*,\\s*(.*)" );
-    if( rfcDateDayRegex.indexIn(parseInput) != -1 )
+    if( rfcDateDayRegex.indexIn( parseInput ) != -1 )
     {
         parseInput = rfcDateDayRegex.cap(1);
     }
+    //Hack around a to strict RFCDate implementation in KDateTime.
+    //See http://bugs.kde.org/show_bug.cgi?id=231062
+    QRegExp rfcMonthLowercase( "^\\d+\\s+\\b(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\b" );
+    if( rfcMonthLowercase.indexIn( parseInput ) != -1 )
+    {
+        QString lowerMonth = rfcMonthLowercase.cap( 1 );
+        QString upperMonth = lowerMonth;
+        upperMonth.replace( 0, 1, lowerMonth.at( 0 ).toUpper() );
+        parseInput.replace( lowerMonth, upperMonth );
+    }
+
     QDateTime pubDate = KDateTime::fromString( parseInput, KDateTime::RFCDate ).dateTime();
 
     debug() << "result: " << pubDate.toString();
Comment 6 Thomas Tanghus 2010-03-23 23:19:36 UTC
Cool. Thanks. That was fast :-)