Bug 125212

Summary: allow ioslaves to override the encoding if they know better
Product: [Frameworks and Libraries] frameworks-kio Reporter: Gilles Schintgen <gschintgen>
Component: generalAssignee: David Faure <faure>
Status: CONFIRMED ---    
Severity: wishlist CC: bugs-kde, kdelibs-bugs, nate
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Gilles Schintgen 2006-04-09 11:57:00 UTC
Version:            (using KDE KDE 3.5.0)
Installed from:    Gentoo Packages

Nowadays all major Linux distributions use UTF-8 as their default (systemwide) encoding. (I don't know about *BSD.)

Unfortunately kio still uses iso-8859-1 as default value. This "breaks" displaying of remote filenames for all (?) protocols. I tried sftp://, ftps:// (not yet in kde), and fish://. They all interpret the remote UTF-8 filenames as ISO-8859-1. I always have to manually change the default encoding to UTF-8 (Tools/Select Remote Charset). At first I didn't even know about this option, so I assumed that kio was broken.

Is there any reason at all to keep the ISO-8859-1 default value?

See also bug #105186 which is a result of this outdated default value.
Comment 1 Philip Rodrigues 2006-04-10 15:33:43 UTC
Does changing your $LANG have any effect?
Comment 2 Gilles Schintgen 2006-04-10 20:11:22 UTC
> Does changing your $LANG have any effect?

I don't know, but currently it's set to en_US.UTF-8. I'd be quite surprised if 
changing to some ISO encoding would set the default to UTF-8...
Comment 3 David Faure 2006-04-10 20:33:51 UTC
I remember seeing KRemoteEncoding in the code, so AFAIK it's possible to configure
the charset used for remote protocols. I forgot how though. Thiago?
Comment 4 Thiago Macieira 2006-04-10 21:07:09 UTC
kio_<protocol>rc:
Charset=utf-8

But since kioslaves don't read config files directly, the default can be set by KIO itself.

It's per-protocol. Default is Latin 1. I'd say WONTFIX in changing the default: many protocols support Unicode now and we should support those, not change the default.
Comment 5 Gilles Schintgen 2006-04-10 21:39:24 UTC
> It's per-protocol. Default is Latin 1. I'd say WONTFIX in changing the
> default: many protocols support Unicode now and we should support those,
> not change the default.

Then could at least the fish protocol's default be changed? In this case, 
Latin 1 is simply outdated. (Or is there some flag telling the client what 
encoding is used on the server? I don't think so, hence a reasonable default 
must be chosen.)
Also in the case of FTP it is recommended (RFC 2640) that servers send 
filenames encoded in UTF-8. Therefore the FTP default encoding should also be 
UTF-8.

I reiterate my question: is there a fundamental reason to have Latin 1 as 
default value?
Comment 6 Thiago Macieira 2006-04-10 23:30:07 UTC
Yes, because of legacy servers. The default will always be Latin 1 for old protocols not supporting Unicode.

fish, however, should be able to determine the remote encoding using the remote LANG variable. FTP can determine if the remote side supports UTF-8 by sending the FEAT command. SMB is already Unicode-only.

This is the latest ProFTPD available in Mandriva:
FEAT
211-Features:
 MDTM
 REST STREAM
 SIZE
 AUTH GSSAPI
 ADAT
 PBSZ
 PROT
 ENC
 MIC
 CONF
 CCC
211 End

As you can see, it is not Unicode-ready.

Please file a wish for fish to detect the remote encoding on its own.
Comment 7 Gilles Schintgen 2006-04-11 11:26:51 UTC
> Yes, because of legacy servers. The default will always be Latin 1 for old
> protocols not supporting Unicode.

As far as I can tell it *doesn't* depend on protocols explicitly supporting 
UTF-8 or not. AFAIK most protocols (except newer ones) simply send filenames 
as they can be found on disk. Just like Linux treats filenames, that is 
filenames are bytestreams and let the user decide how they should be 
interpreted. Consequently the encoding used for storing filenames on the disk 
automatically becomes the default encoding for these protocols as well. And 
that has been UTF-8 for a few years now!

> fish, however, should be able to determine the remote encoding using the
> remote LANG variable.

I'll file a separate wish about it. However my remarks from above also apply. 

> FTP can determine if the remote side supports UTF-8 
> by sending the FEAT command.

Does kio_ftp implement RFC2640, i.e. UTF-8 detection? Or should I file another 
wish?

Thanks anyway for your explanations.
Comment 8 Thiago Macieira 2006-04-11 12:52:44 UTC
That isn't true for remote servers. Suppose you're connecting to a Chinese FTP server that isn't UTF-8. You'll see garbage anyways, unless you know that you should switch to a Chinese encoding -- and which encoding, since there are 5 in use.

So, no, the default will stay Latin 1 for remote protocols that don't tell us what their encoding is. The user is given the option to change encodings.

kio_ftp doesn't implement RFC 2640 yet. I think there's already a wish for it, but if there isn't feel free to open one. It's in my to-do list anyways.
Comment 9 Gilles Schintgen 2006-04-11 13:35:06 UTC
> That isn't true for remote servers. Suppose you're connecting to a Chinese
> FTP server that isn't UTF-8. You'll see garbage anyways, unless you know
> that you should switch to a Chinese encoding -- and which encoding, since
> there are 5 in use.

In that case, a default value of UTF-8 would be just as "correct" as Latin1.

> So, no, the default will stay Latin 1 for remote protocols that don't tell
> us what their encoding is. The user is given the option to change
> encodings.

Latin1 is useful for compatibility with old servers. Agreed.
UTF-8 is an excellent choice in unknown situations. It's used nearly 
everywhere. UTF-8 is also necessary to be compatible with *current* 
distributions.

In my eyes, compatibility with modern distributions is more important than 
compatibility with old distributions.

Would you accept if I reopen this wish to give other users time to comment or 
vote? I'm really bothered by the fact that if I set up a tiny home network 
with some modern distribution there will be encoding problems. This is 
definitely not user-friendly.
Or another idea: why not default to the local system's encoding. This would 
have even better chances of doing the right thing in your example of a 
Chinese FTP server.

> kio_ftp doesn't implement RFC 2640 yet. I think there's already a wish for
> it, but if there isn't feel free to open one. It's in my to-do list
> anyways.

I didn't find one, so I filed wish #125355.
Comment 10 Thiago Macieira 2006-04-11 16:01:50 UTC
UTF-8 cannot decode everything. So if you end up with UTF-8-decoded garbage, it may not produce correct results.

For this reason, the default encoding must not be multibyte or stateful encoding.

The local system encoding is used for the local system only. Besides, even if we changed the support in KDE, you'd STILL have the same problem with non-KDE programs and even KDE programs that don't use kio_ftp (there's one FTP client implementing FTP on its own).

PS: we don't have to reopen to let people vote or post comments.
Comment 11 Gilles Schintgen 2006-04-11 18:40:09 UTC
> UTF-8 cannot decode everything. So if you end up with UTF-8-decoded
> garbage, it may not produce correct results.
>
> For this reason, the default encoding must not be multibyte or stateful
> encoding.

Yes, there would be the question of how to cope with invalid UTF-8.
Having an 8-bit encoding as default is indeed less complex.

> PS: we don't have to reopen to let people vote or post comments.

Ok, I'll leave it at that.
I'm looking forward to RFC2640 and SSH encoding detection being implemented.

Thanks anyway for your patience.
Comment 12 Thiago Macieira 2006-04-13 22:10:29 UTC
Since I see you're asking for encoding detection and referring to this bug report, I am hijacking it.

Currently, even if the ioslave detects the encoding, the Tools | Select Remote Charset menu in Konqueror won't change.
Comment 13 Gilles Schintgen 2006-04-14 11:53:23 UTC
> Since I see you're asking for encoding detection and referring to this bug
> report, I am hijacking it.
>
> Currently, even if the ioslave detects the encoding, the Tools | Select
> Remote Charset menu in Konqueror won't change.

Anything that hides the encoding mess is welcome.
Thanks
Comment 14 bugs-kde 2006-08-16 01:10:29 UTC
So is this bug now about remote UTF-8 detection by kio_fish by analysing remote LANG env variable?

If not, has anyone filed appropriate wish?
Comment 15 bugs-kde 2006-11-10 15:04:00 UTC
Gilles Schintgen has opened bug 125351 for remote UTF-8 detection by kio_fish.