Version: 1.0-beta5 "Pierre" (using KDE 3.2.3, Gentoo) Compiler: gcc version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6) OS: Linux (i686) release 2.6.6-win4lin-r3 I would call this a future development idea. FYI - "Web scraping is the practice of getting information from a web page and reformatting it." The idea is to have, hopefully community created, scripts that would convert a non-rss site into an rss formated file. I could easily see the scripts becoming standardized and shared freely. One naming method would be {site}-{date} (e.g., www-cnn-com-20040721.py) I like python. :) The method would be simple, akregator would have a script associated with a feed. The script outputs a valid xml file so now instead of getting it from the Internet akregator gets it from the script. If the output is invalid akregator would treat it just like an invalid source thus there is not security issues involved using the scripts. Everything else about akregator remains the same. Local urls are supported but unless you step up cron jobs there is no automation or central repository.
Hi, just stumbled upon this bug report. If anyone is interested, I have created such a script repository that is currently used by Liferea (a GTK/Gnome reader) and my own one (console). The webpage is at http://kiza.kcore.de/software/snownews/snowscripts/ and is more or less exactly what Charles suggested I think. :)
Yes, it is and this is what I expected... the scripts already exist. Hopefully the idea would catch on and maybe KDE could standardize on a particular script language. The end result would a repository that *any* RSS could use.
They can already be used by any RSS reader that can either 1) execute and load external scripts as a feed source or 2) pipe a downloaded resource through a script that converts it to an RSS feed on-the-fly. #2 has the advantage that you don't need to download the source with an external application and take advantage of the reader's builtin downloader which should support things like conditional GET, compression, etc. I don't think a particular language needs to be standardized. As long it's commonly used/installed like Perl, Python or bash+GNU textutils. Anyway, you're free to use and link to this page and help creating more scripts. Of course complaining to the pages the script work for so that they provide a decent RSS feed in the first place is the ultimate goal. ;)
*** Bug 177268 has been marked as a duplicate of this bug. ***
*** Bug 187738 has been marked as a duplicate of this bug. ***
Thank you for the bug report. As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists. If this bug is no longer persisting or relevant please change the status to resolved.
Yes, the desire for this feature still exists. (I'm another ex-Liferea user.)
Thank you, but in the meantime I managed to wrote my own feed aggregator with full scraping support :-D If someone is interested here it is: https://github.com/acavalin/rrss
*** Bug 451188 has been marked as a duplicate of this bug. ***
This is a must feature. There are several web scrapping scripts that provide users with abilities to select a set of rules (CSS Selectors or XPath) and the script does the rest of the work. I'm using it extensively with Liferea. No RegEx is needed.