Bug 49523 - [testcase] [patch] Impossible to follow links to local pages if html locale differs
Summary: [testcase] [patch] Impossible to follow links to local pages if html locale d...
Status: RESOLVED LATER
Alias: None
Product: konqueror
Classification: Applications
Component: khtml part (show other bugs)
Version: 3.5
Platform: Debian testing Linux
: NOR normal
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
: 59150 82145 99777 116523 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-10-22 08:31 UTC by Nikita V. Youshchenko
Modified: 2008-05-04 19:28 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
This patch seems to correct the Konqueror's behavior on my machine. I use UTF-8 locale. (1.81 KB, patch)
2006-07-24 10:10 UTC, Robert Kovacs
Details
index.html file which helps to reproduce the bug (202 bytes, text/html)
2007-01-06 14:29 UTC, Dmitry Suzdalev
Details
1.html file to be placed into the same directory as index.html (63 bytes, text/html)
2007-01-06 14:30 UTC, Dmitry Suzdalev
Details
Please see this patch. (745 bytes, patch)
2007-05-22 10:47 UTC, stanv
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nikita V. Youshchenko 2002-10-22 08:31:18 UTC
Version:            (using KDE KDE 3.0.4)
Installed from:    Debian testing/unstable Packages
Compiler:          gcc 2.95.4 
OS:          Linux

My system locale is ru_RU.KOI8-R.

If I create a directory with russian name, and place some cross-linked HTML pages in windows-1251 encoding, konqueror is unable to follow local links (such as <a href="x.html">).

When the pointer in over the link, konqueror shows in the status line the pathname it is going to open. It has the russian directory name recoded from koi8 to windows-1251.

I've looked at the sources.
In khtml_part.cpp, in method KHMLPart::completeURL(), the following is written:

...
  if (d->m_decoder)
    return KURL(d->m_doc->completeURL(url), d->m_decoder->codec()->mibEnum());
...

I guess that in my case m_doc->completeURL() returns full pathname of the link (including cyrrilic-named directory in correct encoding), and then it is recoded.
If I comment out those lines, the problem disappears. But I guess it is not the correct fix. Possibly the correct fix is to recode relative URL, but not the base URL.
Comment 1 Stephan Kulow 2003-10-24 17:12:21 UTC
do you still have the problem with KDE 3.1 or even 3.2?
Comment 2 Nikita V. Youshchenko 2003-10-24 21:12:41 UTC
Unfortunately yes.
I can reproduce this on current Debian Sid system, with KDE 3.1.
The bug is reproduced as follows:
- ensure that system locale is ru_RU.KOI8-R
- make a directory with russian name, e.g. "
Comment 3 Stephan Kulow 2003-10-25 09:26:41 UTC
Hmm, did you set the LANG in kdeinit's environment too?
Comment 4 Nikita V. Youshchenko 2003-10-25 09:34:02 UTC
Yes.
LANG is set in environment of all apps.
It is absolutely necessary to get correct behaviour on russian system (e.g. to read russian filenames from filesystem)
Comment 5 Nikita V. Youshchenko 2003-10-25 09:38:21 UTC
I was a bit inaccurate
"
Comment 6 Nikita V. Youshchenko 2004-05-18 21:20:31 UTC
Just want to note that the bug is still there in KDE 3.2.2
Comment 7 Tommi Tervo 2005-10-28 12:57:37 UTC
*** Bug 99777 has been marked as a duplicate of this bug. ***
Comment 8 Robert Kovacs 2006-07-24 10:10:18 UTC
Created attachment 17109 [details]
This patch seems to correct the Konqueror's behavior on my machine. I use UTF-8 locale.

I found the same problem with KDE 3.5.3. My locale is UTF-8 and I have problems
with paths containing hungarian accentuated letters. The attached patch seems
to solve this problem although I don't know whether this is the right approach
to eliminate this bug...
Comment 9 J Appel 2006-09-11 23:07:36 UTC
Can you reproduce the bug with kde 3.5.4?
Comment 10 Nikita V. Youshchenko 2006-09-12 20:27:31 UTC
Yes, bug could be reproduced in KDE 3.5.4 (debian sid, konqueror package version 4:3.5.4-2, kdelibs 4:3.5.4-3)
Comment 11 J Appel 2006-09-12 20:52:54 UTC
nikita, thanks for reporting. confirming the bug.
Comment 12 Jens 2006-10-21 22:08:43 UTC
I was just about to report the same bug with German umlauts in the file name. I'm using (K)Ubuntu with KDE 3.5.5.

To reproduce:

Create path with umlauts in the name with system charset set to UTF-8.
Create HTML frameset with a couple frames in that directory with a "meta http-equiv charset=iso8859-1" header inside them.
Display HTML file with Konqueror.

Result: Konqueror cannot display the frameset. Even though the file names contain no special characters, Konqueror tries to find the frame files by using the _full_ path of the file, but converted to the charset specified in the frameset.

This is obviously wrong ... and a major PITA, because you cannot save any downloaded web site directories in a directory whose path contains non-ASCII characters.

A quick fix with the above patch would really be appreciated!

Thank you :)

Jens
Comment 13 illogic-al 2007-01-06 14:13:24 UTC
*** Bug 116523 has been marked as a duplicate of this bug. ***
Comment 14 Dmitry Suzdalev 2007-01-06 14:27:39 UTC
Here's the more concrete testcase.

I suppose that your locale is UTF8.

Steps:

1. Create directory with russian name (you may copy-paste) it from here:
   mkdir СуперПуперПапка
   cd СуперПуперПапка

2. Put attached index.html and 1.html to created directory
3. Open index.html in konqueror (index.html is a very simple html with charset=koi8-r)
4. Hover a link, look what status bar shows (doh!) and try to follow it. No luck

If you edit index.html and change it so "charset=utf-8", all will start to work - you can follow the link.

Hope this helps :).

Cheers.
Comment 15 Dmitry Suzdalev 2007-01-06 14:29:31 UTC
Created attachment 19124 [details]
index.html file which helps to reproduce the bug
Comment 16 Dmitry Suzdalev 2007-01-06 14:30:50 UTC
Created attachment 19125 [details]
1.html file to be placed into the same directory as index.html
Comment 17 illogic-al 2007-01-06 14:56:43 UTC
It seems konqueror is encoding the whole URL to that specified in the webpage. 
I placed the two testcase files in a directory named ЧтоНибудь
The encoding on my machine is utf-8 so this was the directory's encoding. 
The index.html containing the link to 1.html was koi8-r encoded. This forced the link to 1.html to become koi8-r encoded. 
Since the directory which contained 1.html was encoded in utf-8, the whole url being in koi8-r no longer pointed to the correct location. 
Konq is correct, but a little too correct. Apparently a fix exists in this thread. 
Just encoding the part of an URL in a webpage instead of the whole URL seems to be a potential solution. 
Good luck with that ;-)
Comment 18 illogic-al 2007-01-06 15:58:12 UTC
*** Bug 82145 has been marked as a duplicate of this bug. ***
Comment 19 Andrew Muhametshin 2007-03-10 11:56:37 UTC
Already more than four years to this problem! Dear developers - please repair this bug!

FreeBSD-6.2, KDE-3.5.5
Comment 20 stanv 2007-05-22 10:47:57 UTC
Created attachment 20664 [details]
Please see this patch.

Hello.

Please see attached patch.

The main problem is in
kdelibs-3.5.6/kdecore/kurl.cpp
in constructor:
KURL::KURL( const KURL& _u, const QString& _rel_url, int encoding_hint )

at: 599 :KURL tmp( url() + rUrl, encoding_hint);

url() - return string not encoded in encoding_hint

Have fun.
Comment 21 Markus 2008-02-07 13:30:39 UTC
This bug is still there in KDE 3.5.8 (using Debian testing).
It only occurs when the charset in the HTML page is explicitly set and differs from the one used for the filesystem.
I mirrored recently the FFMPEG doxygen documentation from www.irisa.fr, which contains charset=iso8859-1. When I remove this or change it to utf-8, all works as expected.
I am very in favor of the fix to only recode the relative URL since it works as expected for me.
Comment 22 George Goldberg 2008-04-06 03:51:40 UTC
test case from comment #14 works for me in svn trunk r793457.
Comment 23 George Goldberg 2008-04-16 08:40:28 UTC
Bug still present in Konqueror 3.5.9.
Comment 24 Michael Leupold 2008-04-21 10:17:56 UTC
As this bug is fixed in recent releases it's being closed. Setting it to LATER because it does qualify for backporting to the 3.x branch.
Comment 25 Michael Leupold 2008-05-04 19:28:35 UTC
*** Bug 59150 has been marked as a duplicate of this bug. ***