Summary: | konqueror: Konqueror urldecodes URLs and remembers the decoded URL | ||
---|---|---|---|
Product: | [Applications] konqueror | Reporter: | Eckhart Wörner <ewoerner> |
Component: | general | Assignee: | Konqueror Developers <konq-bugs> |
Status: | RESOLVED NOT A BUG | ||
Severity: | normal | CC: | faure, kde, rakuco, thiago |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Debian testing | ||
OS: | Unspecified | ||
Latest Commit: | Version Fixed In: |
Description
Eckhart Wörner
2010-03-24 15:27:19 UTC
We follow the RFC to the letter on this point. Slashdot is broken. This needs to be common bug in webservers to start violating the RFC. RFC 3986: 6.2.2.2. Percent-Encoding Normalization The percent-encoding mechanism (Section 2.1) is a frequent source of variance among otherwise identical URIs. In addition to the case normalization issue noted above, some URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts. These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3. I confirm, we send a different HTTP GET when typing %2B or + in the location bar, because KUrl/QUrl keeps it as is. On the other hand I can't say if that's a bug or not. (Surely '+' in a path is not ambiguous, '+' has a special meaning only in queries) Thiago: should QUrl encode '+' in paths? This bug has 3 possible outcomes, I don't know enough to decide: 1) QUrl::setEncodedUrl(TolerantMode) should encode '+' in paths 2) KUrl::prettyUrl shouldn't make '+' pretty 3) slashdot is indeed broken I made a local patch (+unittest) for 2), but it breaks the prettiness somewhat. from qurl.cpp, which is reporting from RFC 3986: #define ABNF_sub_delims "!$&'()*+,;=" #define ABNF_pchar ABNF_sub_delims ":@" static const char pathExcludeChars[] = ABNF_pchar "/"; So + don't have to be encoded in path components. In other words, treating %2B differently from + in path components is a bug in the server. Note that the slash is special. %2F is not the same as /. Finally, in the query, from the URI spec's point of view, %2B and + *are* the same. This is caused by HTML FORM convention, not part of the RFC. URLs should internally always keep their encoded forms. QUrl already does that, which is proven by the fact that the original report says you can visit those pages. Since I disagree with the reporter's assertion that the URL displayed is not valid, I would close this bug as INVALID. Okay, thank you for having a look at this. |