Bug 65378 - Konqueror can't open other applications in UTF8 file mode.
Summary: Konqueror can't open other applications in UTF8 file mode.
Status: RESOLVED WAITINGFORINFO
Alias: None
Product: kio
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: David Faure
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-10-02 15:40 UTC by Pupeno
Modified: 2013-06-06 04:34 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Attempt at fixing the problem (5.36 KB, patch)
2003-11-18 00:10 UTC, Thiago Macieira
Details
screenshot of the new side effect (27.38 KB, image/png)
2004-02-11 16:47 UTC, Egmont Koblinger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pupeno 2003-10-02 15:40:28 UTC
Version:            (using KDE KDE 3.1.3)
Installed from:    Gentoo Packages
Compiler:          gcc version 3.2.3 20030422 (Gentoo Linux 1.4 3.2.3-r1, propolice) Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.3/specs Configured with: /var/tmp/portage/gcc-3.2.3-r1/work/gcc-3.2.3/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/3.2 --includedir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.3/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/3.2 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/3.2/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/3.2/info --enable-shared --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu --with-system-zlib --enable-languages=c,c++,ada,f77,objc,java --enable-threads=posix --enable-long-long --disable-checking --enable-cstdio=stdio --enable-clocale=generic --enable-__cxa_atexit --enable-version-specific-runtime-libs --with-gxx-include-dir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.3/include/g++-v3 --with-local-prefix=/usr/local --enable-shared --enable-nls --without-included-gettext Thread model: posix 
OS:          Linux

I'm running in UTF8 file mode (by exporting the following enviroment variable: KDE_UTF8_FILENAMES) so the files are written and read in UTF8.
Everything goes ok untill I try to open a file with an extended character in another application... for example.
I have the file:
/home/pupeno/pr
Comment 1 Thiago Macieira 2003-10-02 22:41:47 UTC
Is KDE_UTF8_FILENAMES being exported to all programs (i.e., before KDE is started)? 
Comment 2 Pupeno 2003-10-02 22:47:05 UTC
Yes... or at least, I think so... it is specified on /etc/profile as 
export KDE_UTF8_FILENAMES="1" 
Thanks 
Comment 3 Egmont Koblinger 2003-10-29 18:40:14 UTC
I just wanted to report the same problem, so at least now I confirm it.
KDE 3.1.4, KDE_UTF8_FILENAMES=1 exported before doing a 'startx kde'.

I have a text file so that a left click tries to open it with kwrite. No matter
if I left click on it, or choose any application by right clicking, it is not
opened. "ps ax" clearly shows that the filename gets converted from utf8 to my
locale, I can perfectly read the filename in the output of "ps ax" on an 8 bit
terminal, though I should see weird stuff here (utf8 printed as latin2).

No component of KDE should ever convert a filename when it is passed from one
application to another, filenames should be simply handled as a sequence of
bytes without interpreting them in any way. Conversion should only happen when
a filename is displayed on the screen or entered by the user. Furthermore, if
KDE_UTF8_FILENAMES is set, no conversion should ever happen at all.

Ps. Is there any particular reason why KDE is not yet defaulting to utf8
filenames? Taking a look at Gnome2, Samba3 and all the modern software, it is
clear that utf8 filenames is the only way to go. Locale-dependant filename is a
braindamaged obsolete idea as it is unable to satisfy the needs of i18n/l10n.
This is one of the few things where Gnome is clearly ahead of KDE. I hope KDE
will use utf8 filenames by default as soon as possible.
Comment 4 Thiago Macieira 2003-10-29 23:05:36 UTC
Ok, I'm now in an UTF-8 environment myself, having UTF-8 as the locale's encoding. So let's try this again:

KDE 3.2beta1 (HEAD 20031026)
LANG=pt_BR.UTF-8
LC_COLLATE=POSIX (I don't know if this matters)
KDE_UTF8_FILENAMES is unset

I've opened Konqueror and I can see these two subdirectories:
/tmp/kdetest/Curriculum Vitæ
/tmp/kdetest/R
Comment 5 Egmont Koblinger 2003-10-29 23:51:18 UTC
I have KDE_UTF8_FILENAMES set and LANG=hu_HU which is a non-utf8 (actually
iso-8859-2) locale. Please try this combination. This is the only combo that
works incorrectly for me. If I either unset KDE_UTF8_FILENAMES or set my LANG
to hu_HU.UTF-8, konqueror behaves as expected.

I've heard a lot about Red Hat's default utf8 settings but wasn't yet able to
try it. If I set LANG=hu_HU.UTF-8, most of the console applications go crazy
(even if I set my terminal to utf8 too), e.g. it's quite impossible to use mc
and joe. I don't know what Red Hat has done with these, but this is one thing
why I still have to use an 8 bit locale for a while. Despite of this, I'd like
filenames to be in utf8 (and I know that I won't see them correctly in mc or ls,
I only want to see them correctly and consistantly under Gnome2 and KDE3).
What I expect from KDE_UTF8_FILENAMES is that it forces utf8 filenames no
matter what my locale is. Actually it's clear that KDE_UTF8_FILENAMES has no
affect at all if I have an utf8 locale, since in this case there's no
difference between the current locale and utf8. So the role of
KDE_UTF8_FILENAMES is to force utf8 filenames even for old 8-bit locales.
(In the mean time, shame, mainstream glibc not only defaults to old 8-bit
locales, but doesn't even build the .utf8 locales by default, it needs to be
patched to create these locales. Red Hat is one of the most cutting edge distros as it has always been, but many other distros don't even ship .utf8 locales...)
Comment 6 Ken Deeter 2003-11-11 23:27:07 UTC
I think its actually a good thing that KDE doesn't default to utf-8 filenames yet. In gnome, I'm just forced to set G_BROKEN_FILENAMES because all my filenames are already in another encoding. 

I believe Samba3 is different, in that it negotiates unicode 'on the wire' which means the stuff that goes across the wire is unicode, but what it writes to the local filesystem is in the local 8bit if desired.

You can't just say use utf8 (or utf16 or ucs2 or ucs4) for everything, because for everything but ASCII applications, you break backwards compatibility. This is why the majority of Japanese users at least still use euc-jp, because input servers, and other various programs are written with the old 8bit encodings in mind. If you're gonna go and fix all these, then people will have nothing to complain about, but how Gnome2 just assumes utf8 can sometimes be VERY inconvenient for users.

The best way to approach it is to make sure glibc and the system level libraries can support utf8 just as they can any other encoding. And programs should be written to be encoding agnostic. This way, when the next unicode comes along, (as current unicode still has issues) we don't have to rewrite all our programs. The attitude that unicode will be the end-all of i18n issues worries many of us cjk issues, because its just plainly not true. It should be thought of as just another encoding that can handle all known written characters to date in some form or another.
Comment 7 Egmont Koblinger 2003-11-12 00:16:46 UTC
Sure there are pros and cons for utf8. I'm not aware of cjk related cons so it
wouldn't make sense for me to argue with you. IMHO the best KDE can do (and
tries to do) is let the user choose a behaviour. You can choose euc-jp or
whatever you want. I could use utf-8 if it wasn't so buggy.

This current bug report is about a misbehavior of konqueror that under certain
circumstances (when utf8 filenames are asked, but a non-utf8 locale is being
used) it _does_ perform some conversion on a filename before passing it to
another application as a command line argument. This is clearly a serious bug.
utf8 and other locales are all about a conversion between a sequence of bytes
and a human-readable representation. When an application passes a filename to
another application, neither the human mind nor rendering the filename on the
screen is as issue, so it must not convert it in any, it mustn't care about
its interpretation, it should treat the filename as a pure sequence of bytes
and leave it unchanged, independently from any settings.
Comment 8 Thiago Macieira 2003-11-12 01:06:12 UTC
I agree with you. The problem is that, internally, all filenames are represented as Unicode codepoints in KDE: the API uses Unicode everywhere. That means to and from 8-bit encodings must be performed when dealing with non-Unicode channels.

As for this bug report in itself, can you make a list of what 8-bit filenames are passed to the applications? It can be a bug in Konqueror's launching of those programs or it could be bugs in those applications reading the filenames.
Comment 9 Egmont Koblinger 2003-11-12 01:35:06 UTC
It must be konqueror's bug (or a bug in a core kde library that konqueror uses),
since this filename is encoded in my locale's encoding (latin2) in the command
line of kword or whatsoever, though it should be utf8 here.

Anyway I can't see the point in the design you described. IMHO a "to and from"
conversion-pair means a faulty design. I don't know if the locale can be changed
on the fly in kde apps, but if it can, then I open a file, so its name gets
converted from the current locale to utf8, then I change the locale in kcontrol,
and then save this file, its name is converted back from utf8 to the new locale,
and now it's stored with a different file name??? This is just an expectation,
but sounds wierd. IMHO it'd be a much better design if filenames were kept in
memory as simply a sequence of bytes as returned by the kernel, and only
converted to/from utf8 for on-screen displaying purposes.
Comment 10 Thiago Macieira 2003-11-12 01:44:24 UTC
Yes, I think it is. I think Konqueror is aware of UTF-8 filenames and is decoding them properly. However, when launching other processes, I believe it isn't aware it's a filename and, therefore, uses the locale's 8-bit encoding to encode. That creates invalid filenames.

This is why I asked you to check what filenames those applications are being given.

Now, locales can't change on the fly. Therefore, a filename converted should always be reencodable to its original form. Unfortunately, Qt was designed with Win32 compatibility in mind in which Unicode filenames are used to call the API. Therefore, all strings in Qt which aren't directly connected to 8-bit channels are Unicode.

If you feel that's a design bug, please submit a patch changing the whole Qt and KDE code to Qt. I doubt it'll be included... It's one of those design decisions that, flawed or not, can no longer be reversed.
Comment 11 Egmont Koblinger 2003-11-12 01:51:14 UTC
> It's one of those design decisions that, flawed or not, can no longer be
> reversed.
Yes, I agree with it, this is the kind of design decision which cannot be
changed later. I don't even expect anyone to do it :-))
I'd only like that one particular konqueror conversion bug to be fixed... :)

Unfortunately I can only code in C, not C++, and kde is too big for me. I spent
an hour or so trying to catch this bug, with no success at all :-((
Comment 12 Thiago Macieira 2003-11-12 02:01:10 UTC
That's what us KDE developers are here for. We can't expect every bug report to be posted by a programmer who also posts a patch for fixing it. (Though we can dream, can't we?)

Anyways, please do the test for me: create two files with names containing non-ASCII characters in Konqueror and try to open them in different programs. Make the first filename be encodable in your locale and make the second contain at least one character outside your encoding (The Euro symbol € for instance).

Then look up those processes Konqueror started in the process table (ps) and paste their command-lines here. If I am right, you should be reading the correct filename for the first file (though that automatically means that the encoding is wrong) and you should be seeing a ? where the non-Latin2 character should be present.

By the way, what KDE version are you using?
Comment 13 Egmont Koblinger 2003-11-12 02:30:38 UTC
Let's see... freshly created account, log in on linux console, export
KDE_UTF8_FILENAMES=1, LANG is set to hu_HU, no LC_anything is set.
startx kde

Created two text files with some content in konqueror, test-aacute-á and
test-aacute-á-euro-€.
Verified on linux console, their names are properly encoded in utf8.

Right click -> open with kwrite.
In both cases an empty file is opened in kwrite.
ps ax shows these, copy-pasted from a latin2 console:
24327 ?        S      0:01 kdeinit: kwrite /tmp/test-aacute-á
25032 ?        S      0:01 kdeinit: kwrite /tmp/test-aacute-á-euro-?
the fact that I see the 'á' properly in latin2 console means they are not in utf8, which is bad. If I save them in kwrite, new files with the above names
(that is, latin2 encoded test-aacute-á and test-aacute-á-euro-?) appear. Their
names are shown as "test-aacute- " and "test-aacute- -euro-?" in konqueror.

The window title of kwrite is "test-aacute- " and "test-aacute- -euro-?", too.

If I correctly understand your words, this is what you expected.

I'm using KDE 3.1.4.
Comment 14 Thiago Macieira 2003-11-12 02:47:18 UTC
Yes, it is. It's exactly whatI thought it would be.

Unfortunately, there's a good chance there is no easy fix for this. I'll look into it, but no promises...

to other developers: I have no idea where to start looking for this. The bug is that the filename in a QString argument to KRun or something is being encoded with QString::local8Bit instead of QFile::encodeFilename.
Comment 15 Thiago Macieira 2003-11-17 17:13:21 UTC
Ok, here's an update. I've located the problem, but I am in doubt on how to proceed.

When opening file using KRun, KDE applications talk to KLauncher via DCOP, providing the URL list as a QStringList. Well, URLs are well-defined resource locators and are supposed to be universal, locale-independant. Therefore, I believe this is the correct form (see IRI IETF drafts).

However, klauncher starts the new process thus (KLauncher::createArgs in kdelibs/kinit/klauncher.cpp):
  QStringList params = KRun::processDesktopExec(*service, urls, false);

  for(QStringList::ConstIterator it = params.begin();
      it != params.end(); ++it)
  {
     request->arg_list.append((*it).local8Bit());
  }

See the local8Bit there? That's exactly the source of the problem.

The best solution is here is to couple the macro-replacer routines with the encoder ones. When KRun::processDesktopExec() returns, the URLs have already been replaced and we've lost the relation to the encoding.

I am unsure on how to proceed here. When %u or %f gets replaced with a filename, it must be encoded with QFile::encodeFilename, but when %u is the full URL, locale encoding should be used.

Note that either encodings might be lossy!
Comment 16 Egmont Koblinger 2003-11-17 17:48:16 UTC
If I understand this: replacing .local8Bit() to .utf8() in the above code
will fix konqueror's application launching behavior in the case when
KDE_UTF8_FILENAMES is set but a non-utf8 locale is being used, however, in the
mean time it might break other stuff (such as opening a URL). Am I right?

A (probably stupid) idea for a disgusting workaround:
When right-clicking on a filename in konqueror, if KDE_UTF8_FILENAMES is set
it could convert the filename from locale to utf8. This way utf8 filenames
get "double utf8", but when KRun later converts it back, it'll be proper utf8
again.
Feel free to ignore this idea if you agree that it's disgusting :-)
Comment 17 Egmont Koblinger 2003-11-17 19:53:26 UTC
I changed that one particular occurance of .local8Bit() to .utf8().
The result is: accents that would fit my locale are handled correctly, they
are encoded in utf8 in the command line. So files that only contain such
characters are opened correctly in konqueror.
Characters that do not fit into my locale (such as the euro symbol) are still
not handled correctly, the command line has one question mark at that position
instead of the three-byte utf8 code.
Comment 18 Thiago Macieira 2003-11-18 00:10:34 UTC
Created attachment 3262 [details]
Attempt at fixing the problem

Your fix is incorrect. It makes every filename be encoded in UTF-8, which is ok
for KDE_UTF8_FILENAMES and UTF-8 locales, but not for all other setups.

Please test the attached patch. I'm on UTF-8 locale and it has worked for me.
Comment 19 Egmont Koblinger 2003-11-18 00:48:52 UTC
Which version of kde is this patch for, cvs 3.1 or 3.2 branch? Unfortunately
it doesn't apply to 3.1.4 and it's not possible to manually apply it, the code
has changed so much in krun.cpp.

If you could please create an equivalent patch for 3.1.4, I'd test it. For the
3.2 branch, you have to wait some weeks for me to have time to try kde 3.2beta,
at this moment I don't have time to do it :-(
Comment 20 Thiago Macieira 2003-11-18 01:20:27 UTC
I'll wait for your 3.2beta. We'll need some debugging anyways.
Comment 21 Egmont Koblinger 2004-02-11 16:40:00 UTC
The patch above seems to solve the current problem using kde 3.2.0.
However, I've already found a side effect.

Without this patch, the window title of the KDE Control Center is always okay,
either KDE_UTF8_FILENAMES set or unset.

With this patch, if KDE_UTF8_FILENAMES is turned on, the Hungarian translation
of "KDE Control Center" has incorrect accents, it looks as when you cat an
utf8-encoded text in a latin2 terminal. However, the first part of the window
title (the module name) is shown correctly.
Comment 22 Egmont Koblinger 2004-02-11 16:47:49 UTC
Created attachment 4640 [details]
screenshot of the new side effect
Comment 23 David Faure 2005-06-17 21:24:41 UTC
I have had this kind of problem when kdeinit was started with a different LANG or LC_ALL environment than the rest of KDE - due to LANG/LC_ALL being set too late (e.g. in .zshrc, and starting konqueror from konsole).

My ~/.kde/env/lang.sh says
# Added so that kdeinit sees the right lang vars (not sure why it's necessary though)
# Testcase: create a directory in konqueror with an accent in the name
source /etc/sysconfig/i18n
export `sed -e 's/=.*$//' /etc/sysconfig/i18n`

Well, that's mandriva-specific, but what I mean is that setting LANG or LC_ALL in ~/.kde/env/somefile.sh might help.
Comment 24 Thiago Macieira 2005-06-18 00:33:48 UTC
dfaure: that will happen because the string is transferred from Konqueror to klauncher in Unicode, but klauncher will convert to 8-bit. It's a bit different than the case here, but nonetheless an issue.

I don't think it's worth the trouble of fixing this one. I have a semi-working patch, but I haven't dared commit it.

I'm thinking of closing this as WONTFIX and call for KDE_UTF8_FILENAMES to be dropped.
Comment 25 David Faure 2012-10-17 11:23:50 UTC
Is this bug still an issue with KDE 4.9 ?

Most of us are using utf8 for everything, by now...