150054 – Kget fails to download directory-style HTTP URLs which break when index.html is appended

Bug 150054 - Kget fails to download directory-style HTTP URLs which break when index.html is appended

Summary: Kget fails to download directory-style HTTP URLs which break when index.html ...

Status:	RESOLVED FIXED

Alias:	None

Product:	kget
Classification:	Applications
Component:	general (show other bugs)
Version:	0.8.5
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	KGet authors

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-09-21 09:55 UTC by Stephan Sokolow
Modified:	2007-11-11 00:27 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Stephan Sokolow 2007-09-21 09:55:57 UTC

Version:           0.8.5 (using KDE KDE 3.5.7)
Installed from:    Compiled From Sources
OS:                Linux

When trying to download a URL which ends in a / (and therefore appears to be a directory), KGet silently appends index.html, causing 404 errors on many sites which use non-standard DirectoryIndex files or mod_rewrite and friends. (or their non-Apache equivalents, of course)

As of this posting, the problem appears with http://lwn.net/Articles/246381/

It used to happen on Fanfiction.net (requiring me to either use wget and {1..whatever} in the shell to save chapters, or read my fanfiction in Firefox) but they've now altered their configuration so that arbitrary stuff can appear following the final slash without affecting the URL's meaning. (They use it to imbed a sanitized version of the story or author name in the URL)

Comment 1 Urs Wolfer 2007-09-22 16:51:14 UTC

I'm not sure if I really understood right. You would like to download websites? Or files the have no filenames?

Comment 2 Stephan Sokolow 2007-09-23 03:07:31 UTC

Neither is truly correct, but "the files have no filenames" is reasonably close. 

Basically, if I feed http://www.foo.com/bar/ into KGet, it'll try to download 
http://www.foo.com/bar/index.html and, if my index file is named index.php, index.shtml, main.asp, or whatever.foo, it won't work.

Same problem if the URL isn't actually a file path. (For example, Ruby on Rails routes, Pylons routes, many mod_rewrite rules, and PATH_INFO tricks like http://www.foo.com/bar.php/queryID/)

Comment 3 Lukas Appelhans 2007-11-11 00:27:10 UTC

*** Bug has been marked as fixed ***.

Comment 4 Lukas Appelhans 2007-11-11 00:27:29 UTC

Fixed in KDE4-trunk