Bug 89536

Summary: Broken handling of non-textual URLs in UTF-8 mode
Product: [Unmaintained] kdelibs Reporter: Thiago Macieira <thiago>
Component: generalAssignee: Stephan Kulow <coolo>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Page displaying the problem

Description Thiago Macieira 2004-09-15 06:10:18 UTC
Version:           desconhecido (using KDE 3.3.0, compiled sources)
Compiler:          gcc version 3.4.1
OS:                Linux (i686) release 2.6.6

This error has been detected on LANG=en_US.UTF-8. Non-UTF-8 locales probably do not see the problem.

Konqueror loads fine any URL taken from its %XX-encoded form, as should be expected. When displaying the page related to such an URL, the %XX get converted to its friendly (KURL::prettyURL) form, thus rendering the %XX as the corresponding characters.

The problem arises when, in UTF-8 mode and from a non-UTF-8 page, the URL does not represent a proper UTF-8 text. The text gets converted using Latin 1, thus creating an invalid URL for the page being seen. What's more, as per IRI (which Konqueror doesn't yet implement), URLs should always be UTF-8 encoded.

Example:
<html>
  <meta http-equiv="Content-Type" value="text/html; charset=iso-8859-1">
  <p><a href="n%edvel.png">test</a></p>
</html>
(I'll attach this)

When you click the link, it request the correct page and shows "nível.png". If you now try to open the very same URL being shown, it'll request "n%c3%advel.png" instead -- a different page.
Comment 1 Thiago Macieira 2004-09-15 06:10:58 UTC
Created attachment 7533 [details]
Page displaying the problem
Comment 2 Thiago Macieira 2004-09-15 06:17:53 UTC
Yet another problem, probably related:

The same file, when saved to disk and opened directly (as opposed to loading from network), or when charset=utf-8 in the <meta> header, displays even more bizarre URLs.

Apparently, "n%edvel.png" gets converted using Latin 1 into its Unicode form, then decoded into UTF-8. Therefore, we get:

written in page: n%EDvel.png
desired name: 6e ed 76 65 6c 2e 70 6e 67
name shown in Konqueror: nível.png
requested through network: n%c3%advel.png
opened locally: 6e c3 83 c2 ad 76 65 6c 2e 70 6e

Conclusion: ED got displayed as C3 AD and that opened C3 83 C2 AD.
Comment 3 Thiago Macieira 2005-01-25 17:16:33 UTC
The problem with loading the wrong URL, as in comment #0, has disappeared. The wrong filename is still shown, though.
Comment 4 Thiago Macieira 2005-02-13 03:37:33 UTC
I'm guessing this is an IRI issue, so I'm marking as a duplicate of a bug I assigned to me, for KDE 4.

*** This bug has been marked as a duplicate of 55177 ***