Summary: | Download name of link targets containing special characters get ignored | ||
---|---|---|---|
Product: | [Unmaintained] kio | Reporter: | squan |
Component: | http | Assignee: | webkit-devel |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | adawit, kde |
Priority: | NOR | ||
Version: | SVN | ||
Target Milestone: | --- | ||
Platform: | openSUSE | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | 4.6.2 | |
Sentry Crash Report: | |||
Attachments: |
screenshot of prompt...
proposed patch.. |
Description
squan
2010-12-25 13:38:08 UTC
First, if the issue is common in both khtml and kdewebkit, then it must originate in a common component like kio_http and should not be reported against kwebkitpart specifically. Secondly, without a sample link it would be impossible for us to reproduce the problem. As you have to obtain and provide to us the http header from the server. You can find the instruction on how to do that at http://www.konqueror.org/investigatebug/#http The link URLs look like this: http://csn01.save.tv/13225552_CB3D53E19774F97FDA5C710E044C5072_0/?m=dl The server is currently not very responsive, but this unrelated to the bug. For the given URL the target filename is without special characters but due to another (more recent) bug it will not be presented either (with KDE browsers). To see the correct target filename try e.g. firefox. > ...and should not be reported against kwebkitpart specifically ok, component may be kio_http or kurifilter (just uneducated guesses) (In reply to comment #2) > The link URLs look like this: > http://csn01.save.tv/13225552_CB3D53E19774F97FDA5C710E044C5072_0/?m=dl > The server is currently not very responsive, but this unrelated to the bug. I am presented with the dialog box shown in the attached screenshot. Is that the correct name you were exepecting ? Here is what the request and response look like from kio_http's prespective: HTTPProtocol::sendQuery: ============ Sending Header: HTTPProtocol::sendQuery: "GET /13225552_CB3D53E19774F97FDA5C710E044C5072_0/?m=dl HTTP/1.1" HTTPProtocol::sendQuery: "Host: csn01.save.tv" HTTPProtocol::sendQuery: "Connection: Keep-Alive" HTTPProtocol::sendQuery: "User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.16 (KHTML, like Gecko) konqueror/4.6.40 Safari/534.16" HTTPProtocol::sendQuery: "Pragma: no-cache" HTTPProtocol::sendQuery: "Cache-control: no-cache" HTTPProtocol::sendQuery: "Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" HTTPProtocol::sendQuery: "Accept-Encoding: x-gzip, x-deflate, gzip, deflate" HTTPProtocol::sendQuery: "Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5" HTTPProtocol::sendQuery: "Accept-Language: en-US,en;q=0.9" HTTPProtocol::readResponseHeader: ============ Received Status Response: HTTPProtocol::readResponseHeader: "HTTP/1.1 200 OK" HTTPProtocol::readResponseHeader: wasAuthError= false isAuthError= false sameAuthError= false HTTPProtocol::readResponseHeader: -- full response: "HTTP/1.1 200 OK Content-Type: application/octet-stream Connection: close Server: Save.TV/2.51 Content-Disposition: attachment; filename=Eine_Familie_namens_Beethoven_01-01-2011_1435_312315.avi Content-Length: 980521140 Accept-Ranges: bytes Last-Modified: Sun, 2 Jan 2011 22:24:22 +0100 Etag: "13225552_CB3D53E19774F97FDA5C710E044C5072_0"" > For the given URL the target filename is without special characters but due to > another (more recent) bug it will not be presented either (with KDE browsers). If you are talking about kdewebkit browsers only in the most recent KDE 4.6 RC1 release, then it is a known issue that has already been fixed in the 4.6 branch. > To see the correct target filename try e.g. firefox. I get the same thing in the following: Firefox, Chromium, Konqueror + kwebkitpart, Konqueror + khtml and reKonq. > > ...and should not be reported against kwebkitpart specifically > ok, component may be kio_http or kurifilter (just uneducated guesses) If there is a bug that affects both kdewekit and khtml based browsers, then it must be in the lower level class KIO related classes. KUriFilter has nothing to do with such issues at all. Anyhow, I still would need a link that returns the afforementioned german umaluts in order to try and duplicate the problem ; so we can find out where it might be... Created attachment 55494 [details]
screenshot of prompt...
> Is that the correct name you were exepecting? Yes, that's how it should be. > If you are talking about kdewebkit browsers only in the most recent > KDE 4.6 RC1 release, then it is a known issue that has > already been fixed in the 4.6 branch. Yes. Obviously with this fix it works correct in case of no umlauts. > I still would need a link that returns the afforementioned german umlauts Pardon, yesterday I had no such link, so I produced a new one over night: http://csk13.save.tv/12533752_5BC5E6940F545D5455104B3984D2AC91_0/?m=dl (In reply to comment #5) > > Is that the correct name you were exepecting? > Yes, that's how it should be. > > > If you are talking about kdewebkit browsers only in the most recent > > KDE 4.6 RC1 release, then it is a known issue that has > > already been fixed in the 4.6 branch. > Yes. Obviously with this fix it works correct in case of no umlauts. > > > I still would need a link that returns the afforementioned german umlauts > Pardon, yesterday I had no such link, so I produced a new one over night: > http://csk13.save.tv/12533752_5BC5E6940F545D5455104B3984D2AC91_0/?m=dl Here is the request/response for the link above: HTTPProtocol::sendQuery: ============ Sending Header: HTTPProtocol::sendQuery: "GET /12533752_5BC5E6940F545D5455104B3984D2AC91_0/?m=dl HTTP/1.1" HTTPProtocol::sendQuery: "Host: csk13.save.tv" HTTPProtocol::sendQuery: "Connection: Keep-Alive" HTTPProtocol::sendQuery: "User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.16 (KHTML, like Gecko) konqueror/4.6.40 Safari/534.16" HTTPProtocol::sendQuery: "Pragma: no-cache" HTTPProtocol::sendQuery: "Cache-control: no-cache" HTTPProtocol::sendQuery: "Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" HTTPProtocol::sendQuery: "Accept-Encoding: x-gzip, x-deflate, gzip, deflate" HTTPProtocol::sendQuery: "Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5" HTTPProtocol::sendQuery: "Accept-Language: en-US,en;q=0.9" HTTPProtocol::readResponseHeader: ============ Received Status Response: HTTPProtocol::readResponseHeader: "HTTP/1.1 200 OK" HTTPProtocol::readResponseHeader: wasAuthError= false isAuthError= false sameAuthError= false HTTPProtocol::readResponseHeader: -- full response: "HTTP/1.1 200 OK Content-Type: application/octet-stream Connection: close Server: Save.TV/2.51 Content-Disposition: attachment; filename=SpongeBob_Schwammkopf_Gedächtnisverlust_02-01-2011_2330_119407.avi Content-Length: 221861368 Accept-Ranges: bytes Last-Modified: Mon, 3 Jan 2011 08:39:23 +0100 Etag: "12533752_5BC5E6940F545D5455104B3984D2AC91_0"" The problem stems from the fact that you do not quote the filename in the content-disposition header as specified in section 19.5.1 of RFC 2616. However, quoting the filename is not strictly enforced and so kio_http should handle it correctly. Unfortunately a programming error prevents unquoted filenames with non-latin1 characters from being parsed correctly in kio_http and hence this bug. Reassigning to kio_http... I'll have a look if quoting is required or not. Created attachment 55657 [details]
proposed patch..
The browsers behave absolutely correct: they ignore wrong values. The server _must_ send everything that does not fall into US-ASCII as quoted string (and even strings that have a special set of US-ASCII characters). A header is a key=value field, where value may either be a "token" or a "quoted string". Of course there are rules for tokens: This comes from RfC 2616, section 2.2 (http://tools.ietf.org/html/rfc2616#section-2.2): CHAR = <any US-ASCII character (octets 0 - 127)> token = 1*<any CHAR except CTLs or separators> This RfC also says: Many HTTP/1.1 header field values consist of words separated by LWS or special characters. These special characters MUST be in a quoted string to be used within a parameter value (as defined in section 3.6). So the server just sends garbage and the browser just ignores it. There would be other ways of solving this problem by guessing, but guessing would fail sometimes and make the code much more complicated. And at the end that's were rules are for, to make it clear how this stuff has to look like. So please go and bug save.tv to fix their JavaScript. > The server _must_ send everything that does not fall into US-ASCII as quoted string > (and even strings that have a special set of US-ASCII characters). Ok, so that shows that save.tv server clearly violates web standards. But please don't expect save.tv to be a rare exception but take it as just an example. Firefox and chromium (e.g.) work around this RfC violation and properly extract the filename from the server response. Expect that all major browsers do like this. And so should KDE browsers. Just proving that the server isn't RfC compliant is not a sufficient justification to close this bug as INVALID. > So the server just sends garbage and the browser just ignores it. The attribute value in question is no garbage but just missing the quotes required by RfC 2216. > There would be other ways of solving this problem by guessing, > but guessing would fail sometimes and make the code much more complicated. Just disdarding the filename attribute of the server response severely hampers downloads because the user has to retrieve and enter it manually in each single case. May be that diffentiation of a token from a string missing quotes will be a bit messy, but I think fixing this use case for all KDE browsers is worth it. > So please go and bug save.tv to fix their JavaScript. Inspite major browsers coping with this RfC incompliance without problem I have not much hope to succeed with that. >> So the server just sends garbage and the browser just ignores it. > The attribute value in question is no garbage but just missing the quotes > required by RfC 2216. So it is invalid, which qualifies it as garbage. See http://greenbytes.de/tech/tc2231/#attwithasciifilenamenqws for basically the same example and how basically all browsers misbehave. Note the green line of Konqueror. You could ask Julian if this shouldn't be a expected fallback behaviour. >> So please go and bug save.tv to fix their JavaScript. > Inspite major browsers coping with this RfC incompliance without problem I have > not much hope to succeed with that. AFAICS they replaced spaces by underscores in the filenames, likely because that caused some trouble. If you tell them that quoting solves more of their problems maybe they will listen. Test it out. (In reply to comment #11) > >> So the server just sends garbage and the browser just ignores it. > > The attribute value in question is no garbage but just missing the quotes > > required by RfC 2216. > > So it is invalid, which qualifies it as garbage. I believe you are incorrect about this whole thing: First, the "Content-Disposition" header is not an HTTP standard. It is simply a derived one that relies on other standards for its definition as well as security implications. See RFC 2616 section 15.5. As such the sections of RFC 2616 in comment #9 are not technically applicable to this header. You would be better served by looking at RFCs 2183 and RFC 5987 if you have not already done so. Specially RFC 5987, which is not only more recent than RFC 2616, but it also has specific recommendations about the parameters in content-disposition, see section 3.2.1. Second, there is actually a prefect reason why file names with LWSP should be quoted. It avoids ambiguity for parsers that might rely on white spaces to create tokens. However, there is no reason or justification for following that for non-latin1 characters that are not special characters like control and LWSP characters. Third, and we currently violate the spec when the file name are quoted. There is no check for illegal characters such as control characters when parsing quoted file names. Anyhow, to me even if that was not the case and we did everything right, but all the other browsers implemented something differently, then the defacto standard is the one that should be chosen. There is absolutely no way you can adhere to RFCs 100% and get anything that is usable in the real world. A little pragmatism is necessary when implementing these standards. Believe I learned that the hard way as well after implementing several RFCs including 2616 for which I had to write many workarounds for brain-dead implementations... > See http://greenbytes.de/tech/tc2231/#attwithasciifilenamenqws for basically > the same example and how basically all browsers misbehave. Ahhh... that test is for a white space... How are white spaces and valid non-latin1 characters the same ? And why should they be treated the same ? > Note the green line of Konqueror. You could ask Julian if this shouldn't be a > expected fallback behaviour. And the patch I propsed does not break that test, but fixes the problem reported here. http://www.ietf.org/rfc/rfc5987.txt, first sentence on page 5: Thus, a parameter is either a regular parameter (reg-parameter), as previously defined in Section 3.6 of [RFC2616], or an extended parameter (ext-parameter). (In reply to comment #13) > http://www.ietf.org/rfc/rfc5987.txt, first sentence on page 5: > > Thus, a parameter is either a regular parameter (reg-parameter), as > previously defined in Section 3.6 of [RFC2616], or an extended > parameter (ext-parameter). I had a email exchange about this issue with Julian, the author of the content-disposition test site, and asked him to add a test case for this particular scenario. He was receptive to the idea and hopefully he would do so at some point... He also seems to support your position on the spec ; so I will defer on whether or not we should workaround this problem to you... I personally think that we should simply workaround this issue specially since we can do so easily without violating the other requirements of the spec and because this problem is no different than many of the other spec violations for which we already provide workarounds in kio_http. Thanks to Julian, the test cases are already there: http://greenbytes.de/tech/tc2231/#attfnbrokentokeniso http://greenbytes.de/tech/tc2231/#attfnbrokentokenutf I am going to apply the patch I proposed previously as a workaround for this issue. It absolutely makes no sense for us to rigidly adhere to the RFC when none of the other client apps do. BTW, this is no different than any of the other RFC violations we had implemented in kio_http to date, e.g. treating a 302 and 303 server responses the same way. Agreed. But I find the isPrint() conversion doubtful. Please also make sure all existing testcases work properly. (In reply to comment #17) > Agreed. But I find the isPrint() conversion doubtful. Please also make sure all > existing testcases work properly. Why do you think the isPrint() conversion is doubtful ? It correctly excludes every control character, which includes the vertical and horizontal spacers (\v, \t, \n, \r), while making it possible to allow unicode characters present in the string. On top of that all the test cases pass as well... > > Agreed. But I find the isPrint() conversion doubtful. Please also make
> > sure all existing testcases work properly.
>
> Why do you think the isPrint() conversion is doubtful ? It correctly
> excludes every control character, which includes the vertical and
> horizontal spacers (\v, \t, \n, \r), while making it possible to allow
> unicode characters present in the string.
>
> On top of that all the test cases pass as well...
Hm, that change seems to be one of the central points of this change, no? ;)
Ok, ignore me then. I'll review that one tomorrow again when I'm less
confused.
(In reply to comment #19) > > > Agreed. But I find the isPrint() conversion doubtful. Please also make > > > sure all existing testcases work properly. > > > > Why do you think the isPrint() conversion is doubtful ? It correctly > > excludes every control character, which includes the vertical and > > horizontal spacers (\v, \t, \n, \r), while making it possible to allow > > unicode characters present in the string. > > > > On top of that all the test cases pass as well... > > Hm, that change seems to be one of the central points of this change, no? ;) > Ok, ignore me then. I'll review that one tomorrow again when I'm less > confused. Yes, the use of isPrint() is esentially the fix and the unit test for it shows no problems: $ ./httpheaderdispositiontest ********* Start testing of HeaderDispositionTest ********* Config: Using QTest library 4.7.2, Qt 4.7.2 PASS : HeaderDispositionTest::initTestCase() PASS : HeaderDispositionTest::runAllTests() PASS : HeaderDispositionTest::cleanupTestCase() Totals: 3 passed, 0 failed, 0 skipped ********* Finished testing of HeaderDispositionTest ********* So in it goes. :) Git commit 0672bbd10827a95b81fc3358c29fb10cab12ec02 by Dawit Alemayehu. Committed on 06/03/2011 at 21:09. Pushed by adawit into branch 'KDE/4.6'. Allow unicode characters in unquoted filename parameters of the content-disposition header. BUG:261223 FIXED-IN:4.6.2 M +10 -5 kioslave/http/parsinghelpers.cpp http://commits.kde.org/kdelibs/0672bbd10827a95b81fc3358c29fb10cab12ec02 Git commit dec857b341e11987b55b2f80d15be34a5e9131d0 by Dawit Alemayehu. Committed on 06/03/2011 at 21:28. Pushed by adawit into branch 'master'. Allow unicode characters in unquoted filename parameters of the content-disposition header. BUG:261223 FIXED-IN:4.6.2 M +10 -5 kioslave/http/parsinghelpers.cpp http://commits.kde.org/kdelibs/dec857b341e11987b55b2f80d15be34a5e9131d0 |