Bug 105186 - utf-8 filenames are decoded in latin1
Summary: utf-8 filenames are decoded in latin1
Status: RESOLVED WORKSFORME
Alias: None
Product: kio
Classification: Frameworks and Libraries
Component: fish (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Jörg Walter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-06 13:54 UTC by Noam Raphael
Modified: 2007-11-10 18:34 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Noam Raphael 2005-05-06 13:54:42 UTC
Version:           unknown (using KDE 3.3.2,  (3.1))
Compiler:          gcc version 3.3.5 (Debian 1:3.3.5-8)
OS:                Linux (i686) release 2.6.7-1-386

This is probably similar to bug 59431, but since it was closed long ago, and I have no problems with copying, I open a new one.

I'm using fish to move files between my two computers. In both of them, there are files with Hebrew letters in their names, encoded in utf8. KDE works fine with these, and treats the filenames perfectly. However, if I view a directory on the other computer using fish, the filenames are gibberish. I checked: the filenames, after I copied them to the local system, were the original names, encoded in utf8, then decoded as latin1, then encoded again as utf8. I guess the last stage is because of my system's locale.

Thank you,
Noam Raphael
Comment 1 Edward Trager 2005-08-23 17:25:47 UTC
I discovered this bug independently from Noam recently while testing SuSE 10 Beta 2.  As SuSE, Redhat, Ubuntu and other distributions are now defaulting to UTF-8 locales, the FISH protocol should also handle UTF-8 correctly.  As SSH and SFTP seem to have no problem handling UTF-8 file names, I would expect that fixing FISH to handle UTF-8 correctly should require a minimal amount of work.  I've put 20 of my votes on this bug.  I hope there will be a resolution to this soon.  Thanks!

- Ed Trager
  Bioinformatics
  Kellogg Eye Center
Comment 2 Thiago Macieira 2005-08-24 05:25:23 UTC
Tools, Select Remote Charset, utf8
Comment 3 Edward Trager 2005-08-24 13:51:31 UTC
Hi, Thiago,

Which is the default -- UTF-8 or Latin-1 ?  If Latin-1 is the default,
that needs to be changed.  The right answer in the modern Linux world
is to have UTF-8 be the default, and then people can click "Tools -->
Select Remote Charset" if they need to select a non-UTF-8 charset. 
Since SuSE, Redhat, Ubuntu, et alia are setting the default locale to
a UTF-8 locale, it makes no sense to have the fish ioslave still stuck
in Latin-1 mode by default.

- Ed Trager
  Bioinformatics
  Kellogg Eye Center

On 24 Aug 2005 03:25:25 -0000, Thiago Macieira <thiago@kde.org> wrote:
[bugs.kde.org quoted mail]
Comment 4 Thiago Macieira 2005-08-24 13:57:35 UTC
Latin 1 is the default and it won't be changed. That is the case for ALL kioslaves, not only fish.
Comment 5 Edward Trager 2005-08-24 16:23:59 UTC
> 
> ------- Additional Comments From thiago kde org  2005-08-24 13:57 -------
> Latin 1 is the default and it won't be changed. That is the case for ALL kioslaves, not only fish.
> 


Maybe Latin 1 was the right answer a few years ago, but now that Linux
is a global operating system and vendors are defaulting to UTF-8,
clearly KDE needs to convert over to UTF-8.
Comment 6 Gilles Schintgen 2006-04-09 11:58:49 UTC
I filed a wish concerning this outdated default value. See wish #125212.
This issue also affects sftp:// and ftps://. It's quite annoying.
Comment 7 Thiago Macieira 2006-04-10 23:54:59 UTC
Comment #4 is still valid: Latin 1 is the default and will continue to be so for protocols that don't support Unicode.
Comment 8 Mark 2006-04-27 17:47:18 UTC
hmmm... looks like we really need a (not kde related) standard on how to determine host/user-specific file/content encodings?
Comment 9 Thiago Macieira 2006-04-29 17:06:11 UTC
That's easy: convince everyone to use UTF-8 and we're done.
Comment 10 Aldoo 2007-11-10 18:34:35 UTC
And how does that make that bug fixed ?

At least, the user should have a way to choose his preferred encoding.
If I understand well, the scp protocol (or whatever is behind fish) does not specify any encoding. Then either one of those 2 propositions is true:
- the standard assumes that file names are in Latin 1, and if it is so, the ssh server should do some conversion if it runs on an utf8 host (which is obviously not the case... and would not be a good idea, since data loss could occur)
- the standard does not say anything, and then it is wrong for the kio slave to assume that file names are in Latin 1. If that is true, kde should provide some workaround to find out the right encoding (automatic detection, UI, or something like checking the LANG variable of the remote host in a remote ssh shell).

Either way, something is broken and has to be fixed.
As it is, it is really painful to synchronise two remote machines in the everyday use, as I use UTF-8 everywhere.