Bug 232008 - konqueror: Konqueror urldecodes URLs and remembers the decoded URL
Summary: konqueror: Konqueror urldecodes URLs and remembers the decoded URL
Status: RESOLVED NOT A BUG
Alias: None
Product: konqueror
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Debian testing Unspecified
: NOR normal
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-24 15:27 UTC by Eckhart Wörner
Modified: 2010-04-17 14:10 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eckhart Wörner 2010-03-24 15:27:19 UTC
Version:            (using KDE 4.4.1)
Installed from:    Debian testing/unstable Packages

This bug has been copied over from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471930 and has been verified to still exist in KDE SC 4.4.1

----

When you open a page whose URL contains characters that must be 
urlencoded, Konqueror will let you enter the URL properly encoded (with 
% escapes, etc.) and visit the page correctly.

However, it will decode the URL and display the decoded URL in the 
address bar (e.g. + will be changed to space, %2B will be changed to +, 
etc.) This often causes the "URL" in the address bar to not actually be 
a valid URL. For example, if you select the address bar and press 
return, you will receive an error, or at the very least not go to the 
same page whose URL you originally entered.

The decoded URL is also saved in the history, so that you can, for 
example, use the up and down arrow keys in the address bar to select a 
previously visited page, and not go there, because the URL that has been 
saved with it is not the right URL, but the urldecoded version of it.

This behavior annoys me. I can see the point of wanting to display the 
address in decoded form for some users in some situations (e.g. when 
using Konqueror as a file manager - which I never do, by the way). 
However, I would, at the very least, want to be able to turn off this 
functionality, so that the URLs I enter will not be mangled.

Steps to reproduce:
 - Visit any website with an URL that contains characters that need 
   escaping. For example:
   http://slashdot.org/~RAMMS%2BEIN/

 - Konqueror will correctly open the page, but mangle the URL. E.g.
   http://slashdot.org/~RAMMS+EIN/

 - If you try to open the same page again, e.g. by selecting the
   address bar and pressing return, or by selecting the address bar,
   you will not go to the same page you originally visited.

 - If you visit another page, then select the address bar and use the
   up arrow to navigate back to the original page, then press return
   to select it, you will get the mangled URL and you will not visit
   the page whose URL you originally entered.
Comment 1 Allan Sandfeld 2010-03-25 15:28:12 UTC
We follow the RFC to the letter on this point. Slashdot is broken. This needs to be common bug in webservers to start violating the RFC.

RFC 3986:

6.2.2.2.  Percent-Encoding Normalization


   The percent-encoding mechanism (Section 2.1) is a frequent source of
   variance among otherwise identical URIs.  In addition to the case
   normalization issue noted above, some URI producers percent-encode
   octets that do not require percent-encoding, resulting in URIs that
   are equivalent to their non-encoded counterparts.  These URIs should
   be normalized by decoding any percent-encoded octet that corresponds
   to an unreserved character, as described in Section 2.3.
Comment 2 David Faure 2010-03-31 01:12:50 UTC
I confirm, we send a different HTTP GET when typing %2B or + in the location bar, because KUrl/QUrl keeps it as is. On the other hand I can't say if that's a bug or not. (Surely '+' in a path is not ambiguous, '+' has a special meaning only in queries)

Thiago: should QUrl encode '+' in paths?
This bug has 3 possible outcomes, I don't know enough to decide:
 1) QUrl::setEncodedUrl(TolerantMode) should encode '+' in paths
 2) KUrl::prettyUrl shouldn't make '+' pretty
 3) slashdot is indeed broken

I made a local patch (+unittest) for 2), but it breaks the prettiness somewhat.
Comment 3 Thiago Macieira 2010-03-31 08:26:32 UTC
from qurl.cpp, which is reporting from RFC 3986:
#define ABNF_sub_delims         "!$&'()*+,;="
#define ABNF_pchar              ABNF_sub_delims ":@"
static const char pathExcludeChars[]     = ABNF_pchar "/";

So + don't have to be encoded in path components.

In other words, treating %2B differently from + in path components is a bug in the server. 

Note that the slash is special. %2F is not the same as /.

Finally, in the query, from the URI spec's point of view, %2B and + *are* the same. This is caused by HTML FORM convention, not part of the RFC.

URLs should internally always keep their encoded forms. QUrl already does that, which is proven by the fact that the original report says you can visit those pages.

Since I disagree with the reporter's assertion that the URL displayed is not valid, I would close this bug as INVALID.
Comment 4 Eckhart Wörner 2010-04-17 14:10:51 UTC
Okay, thank you for having a look at this.