Version: (using KDE KDE 3.5.0) Installed from: Gentoo Packages Nowadays all major Linux distributions use UTF-8 as their default (systemwide) encoding. (I don't know about *BSD.) Unfortunately kio still uses iso-8859-1 as default value. This "breaks" displaying of remote filenames for all (?) protocols. I tried sftp://, ftps:// (not yet in kde), and fish://. They all interpret the remote UTF-8 filenames as ISO-8859-1. I always have to manually change the default encoding to UTF-8 (Tools/Select Remote Charset). At first I didn't even know about this option, so I assumed that kio was broken. Is there any reason at all to keep the ISO-8859-1 default value? See also bug #105186 which is a result of this outdated default value.
Does changing your $LANG have any effect?
> Does changing your $LANG have any effect? I don't know, but currently it's set to en_US.UTF-8. I'd be quite surprised if changing to some ISO encoding would set the default to UTF-8...
I remember seeing KRemoteEncoding in the code, so AFAIK it's possible to configure the charset used for remote protocols. I forgot how though. Thiago?
kio_<protocol>rc: Charset=utf-8 But since kioslaves don't read config files directly, the default can be set by KIO itself. It's per-protocol. Default is Latin 1. I'd say WONTFIX in changing the default: many protocols support Unicode now and we should support those, not change the default.
> It's per-protocol. Default is Latin 1. I'd say WONTFIX in changing the > default: many protocols support Unicode now and we should support those, > not change the default. Then could at least the fish protocol's default be changed? In this case, Latin 1 is simply outdated. (Or is there some flag telling the client what encoding is used on the server? I don't think so, hence a reasonable default must be chosen.) Also in the case of FTP it is recommended (RFC 2640) that servers send filenames encoded in UTF-8. Therefore the FTP default encoding should also be UTF-8. I reiterate my question: is there a fundamental reason to have Latin 1 as default value?
Yes, because of legacy servers. The default will always be Latin 1 for old protocols not supporting Unicode. fish, however, should be able to determine the remote encoding using the remote LANG variable. FTP can determine if the remote side supports UTF-8 by sending the FEAT command. SMB is already Unicode-only. This is the latest ProFTPD available in Mandriva: FEAT 211-Features: MDTM REST STREAM SIZE AUTH GSSAPI ADAT PBSZ PROT ENC MIC CONF CCC 211 End As you can see, it is not Unicode-ready. Please file a wish for fish to detect the remote encoding on its own.
> Yes, because of legacy servers. The default will always be Latin 1 for old > protocols not supporting Unicode. As far as I can tell it *doesn't* depend on protocols explicitly supporting UTF-8 or not. AFAIK most protocols (except newer ones) simply send filenames as they can be found on disk. Just like Linux treats filenames, that is filenames are bytestreams and let the user decide how they should be interpreted. Consequently the encoding used for storing filenames on the disk automatically becomes the default encoding for these protocols as well. And that has been UTF-8 for a few years now! > fish, however, should be able to determine the remote encoding using the > remote LANG variable. I'll file a separate wish about it. However my remarks from above also apply. > FTP can determine if the remote side supports UTF-8 > by sending the FEAT command. Does kio_ftp implement RFC2640, i.e. UTF-8 detection? Or should I file another wish? Thanks anyway for your explanations.
That isn't true for remote servers. Suppose you're connecting to a Chinese FTP server that isn't UTF-8. You'll see garbage anyways, unless you know that you should switch to a Chinese encoding -- and which encoding, since there are 5 in use. So, no, the default will stay Latin 1 for remote protocols that don't tell us what their encoding is. The user is given the option to change encodings. kio_ftp doesn't implement RFC 2640 yet. I think there's already a wish for it, but if there isn't feel free to open one. It's in my to-do list anyways.
> That isn't true for remote servers. Suppose you're connecting to a Chinese > FTP server that isn't UTF-8. You'll see garbage anyways, unless you know > that you should switch to a Chinese encoding -- and which encoding, since > there are 5 in use. In that case, a default value of UTF-8 would be just as "correct" as Latin1. > So, no, the default will stay Latin 1 for remote protocols that don't tell > us what their encoding is. The user is given the option to change > encodings. Latin1 is useful for compatibility with old servers. Agreed. UTF-8 is an excellent choice in unknown situations. It's used nearly everywhere. UTF-8 is also necessary to be compatible with *current* distributions. In my eyes, compatibility with modern distributions is more important than compatibility with old distributions. Would you accept if I reopen this wish to give other users time to comment or vote? I'm really bothered by the fact that if I set up a tiny home network with some modern distribution there will be encoding problems. This is definitely not user-friendly. Or another idea: why not default to the local system's encoding. This would have even better chances of doing the right thing in your example of a Chinese FTP server. > kio_ftp doesn't implement RFC 2640 yet. I think there's already a wish for > it, but if there isn't feel free to open one. It's in my to-do list > anyways. I didn't find one, so I filed wish #125355.
UTF-8 cannot decode everything. So if you end up with UTF-8-decoded garbage, it may not produce correct results. For this reason, the default encoding must not be multibyte or stateful encoding. The local system encoding is used for the local system only. Besides, even if we changed the support in KDE, you'd STILL have the same problem with non-KDE programs and even KDE programs that don't use kio_ftp (there's one FTP client implementing FTP on its own). PS: we don't have to reopen to let people vote or post comments.
> UTF-8 cannot decode everything. So if you end up with UTF-8-decoded > garbage, it may not produce correct results. > > For this reason, the default encoding must not be multibyte or stateful > encoding. Yes, there would be the question of how to cope with invalid UTF-8. Having an 8-bit encoding as default is indeed less complex. > PS: we don't have to reopen to let people vote or post comments. Ok, I'll leave it at that. I'm looking forward to RFC2640 and SSH encoding detection being implemented. Thanks anyway for your patience.
Since I see you're asking for encoding detection and referring to this bug report, I am hijacking it. Currently, even if the ioslave detects the encoding, the Tools | Select Remote Charset menu in Konqueror won't change.
> Since I see you're asking for encoding detection and referring to this bug > report, I am hijacking it. > > Currently, even if the ioslave detects the encoding, the Tools | Select > Remote Charset menu in Konqueror won't change. Anything that hides the encoding mess is welcome. Thanks
So is this bug now about remote UTF-8 detection by kio_fish by analysing remote LANG env variable? If not, has anyone filed appropriate wish?
Gilles Schintgen has opened bug 125351 for remote UTF-8 detection by kio_fish.