Bug 89536 - Broken handling of non-textual URLs in UTF-8 mode
Summary: Broken handling of non-textual URLs in UTF-8 mode
Status: RESOLVED DUPLICATE of bug 55177
Alias: None
Product: kdelibs
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Stephan Kulow
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-15 06:10 UTC by Thiago Macieira
Modified: 2005-02-13 03:37 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Page displaying the problem (128 bytes, text/html)
2004-09-15 06:10 UTC, Thiago Macieira
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thiago Macieira 2004-09-15 06:10:18 UTC
Version:           desconhecido (using KDE 3.3.0, compiled sources)
Compiler:          gcc version 3.4.1
OS:                Linux (i686) release 2.6.6

This error has been detected on LANG=en_US.UTF-8. Non-UTF-8 locales probably do not see the problem.

Konqueror loads fine any URL taken from its %XX-encoded form, as should be expected. When displaying the page related to such an URL, the %XX get converted to its friendly (KURL::prettyURL) form, thus rendering the %XX as the corresponding characters.

The problem arises when, in UTF-8 mode and from a non-UTF-8 page, the URL does not represent a proper UTF-8 text. The text gets converted using Latin 1, thus creating an invalid URL for the page being seen. What's more, as per IRI (which Konqueror doesn't yet implement), URLs should always be UTF-8 encoded.

Example:
<html>
  <meta http-equiv="Content-Type" value="text/html; charset=iso-8859-1">
  <p><a href="n%edvel.png">test</a></p>
</html>
(I'll attach this)

When you click the link, it request the correct page and shows "nível.png". If you now try to open the very same URL being shown, it'll request "n%c3%advel.png" instead -- a different page.
Comment 1 Thiago Macieira 2004-09-15 06:10:58 UTC
Created attachment 7533 [details]
Page displaying the problem
Comment 2 Thiago Macieira 2004-09-15 06:17:53 UTC
Yet another problem, probably related:

The same file, when saved to disk and opened directly (as opposed to loading from network), or when charset=utf-8 in the <meta> header, displays even more bizarre URLs.

Apparently, "n%edvel.png" gets converted using Latin 1 into its Unicode form, then decoded into UTF-8. Therefore, we get:

written in page: n%EDvel.png
desired name: 6e ed 76 65 6c 2e 70 6e 67
name shown in Konqueror: nível.png
requested through network: n%c3%advel.png
opened locally: 6e c3 83 c2 ad 76 65 6c 2e 70 6e

Conclusion: ED got displayed as C3 AD and that opened C3 83 C2 AD.
Comment 3 Thiago Macieira 2005-01-25 17:16:33 UTC
The problem with loading the wrong URL, as in comment #0, has disappeared. The wrong filename is still shown, though.
Comment 4 Thiago Macieira 2005-02-13 03:37:33 UTC
I'm guessing this is an IRI issue, so I'm marking as a duplicate of a bug I assigned to me, for KDE 4.

*** This bug has been marked as a duplicate of 55177 ***