Summary: | idea: "web scraping" support (support script output as feed source) | ||
---|---|---|---|
Product: | [Applications] akregator | Reporter: | Charles Phoenix <phoenixreads> |
Component: | general | Assignee: | kdepim bugs <kdepim-bugs> |
Status: | REPORTED --- | ||
Severity: | wishlist | CC: | a.wierzbowski, forestix, genghiskhan, oingman, Simon80, steelgunblade |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Charles Phoenix
2004-07-21 15:07:55 UTC
Hi, just stumbled upon this bug report. If anyone is interested, I have created such a script repository that is currently used by Liferea (a GTK/Gnome reader) and my own one (console). The webpage is at http://kiza.kcore.de/software/snownews/snowscripts/ and is more or less exactly what Charles suggested I think. :) Yes, it is and this is what I expected... the scripts already exist. Hopefully the idea would catch on and maybe KDE could standardize on a particular script language. The end result would a repository that *any* RSS could use. They can already be used by any RSS reader that can either 1) execute and load external scripts as a feed source or 2) pipe a downloaded resource through a script that converts it to an RSS feed on-the-fly. #2 has the advantage that you don't need to download the source with an external application and take advantage of the reader's builtin downloader which should support things like conditional GET, compression, etc. I don't think a particular language needs to be standardized. As long it's commonly used/installed like Perl, Python or bash+GNU textutils. Anyway, you're free to use and link to this page and help creating more scripts. Of course complaining to the pages the script work for so that they provide a decent RSS feed in the first place is the ultimate goal. ;) *** Bug 177268 has been marked as a duplicate of this bug. *** *** Bug 187738 has been marked as a duplicate of this bug. *** Thank you for the bug report. As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists. If this bug is no longer persisting or relevant please change the status to resolved. Yes, the desire for this feature still exists. (I'm another ex-Liferea user.) Thank you, but in the meantime I managed to wrote my own feed aggregator with full scraping support :-D If someone is interested here it is: https://github.com/acavalin/rrss *** Bug 451188 has been marked as a duplicate of this bug. *** This is a must feature. There are several web scrapping scripts that provide users with abilities to select a set of rules (CSS Selectors or XPath) and the script does the rest of the work. I'm using it extensively with Liferea. No RegEx is needed. |