Bug 204768

Summary: Dolphin-Konqueror plugin to solve encoding problems marked as WONTFIX
Product: [Applications] dolphin Reporter: Ignacio Serantes <kde>
Component: generalAssignee: Dolphin Bug Assignee <dolphin-bugs-null>
Status: RESOLVED FIXED    
Severity: wishlist CC: alex.danila.web, bugs, bugseforuns, cannewilson, cfeck, comet.friend, danielstefanmader+kde, diegocg, hellnest.fuah, hpj, jtamate, kdebug, L.Bonnaud, martinstolpe, max.schettler, meven, mhlavink, mmtsales, mss, ptselios, pzlocki, Regnaron, roland.leissa, ruben.caro.estevez, shlomif, toddrme2178, victorjss, zayed.alsaidi
Priority: HI    
Version: 16.12.2   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Unspecified   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Ignacio Serantes 2009-08-22 15:18:13 UTC
Version:           Dolphin/Konqueror plugin to solve encoding problems (using Devel)
Installed from:    Compiled sources

Because Thiago Macieria is unable to fix https://bugs.kde.org/show_bug.cgi?id=165044, a plugin system would be implemented to fix files with encoding bugs.

Actually this files are common in not english languages and linux, and others desktops like KDE 3, can manage it so to manage with KDE 4 you need fix using other tools because KDE 4 can't handle.
Comment 1 Eduardo Habkost 2009-08-22 15:48:51 UTC
*** This bug has been confirmed by popular vote. ***
Comment 2 Alex Dănilă 2009-09-04 19:57:39 UTC
Hi, I believe an achievable middle ground is better:
1. Konqueror/Dolphin should be able to rename the file. No other program needs this capability.
2. An small program to do automatic renaming to be provided. It should help users choose the right encoding and then rename recursively. For example the user scrolls through a drop-down list of original encodings, and the new names are shown, such that he can decide what is correct.
3. Any program unable to cope with bad names suggests using the tool, although I don't know is this is easily achieved.
Comment 3 Roland Leißa 2009-09-08 12:56:11 UTC
*** Bug 194361 has been marked as a duplicate of this bug. ***
Comment 4 dcg 2009-09-19 15:55:59 UTC
It's certainly possible to handle broken encodings well if paths are handled as what they really are (streams of bytes separated by the '/' character), just like many command line tools do. Which is why cp -av will never miss a file and will always print the name (and will print the byte string if the encoding is broken), no matter how broken the encoding is, dolphin on the other side can't copy some of my files because it can't see them. Now, if QT can't do anything else, there's not a lot we can do. Files with broken encodings exist (I just got a few GB of them from an XP box, and amarok can't see most of them). But more importantly, files with broken encodings will _continue_ existing forever, its selfish to think the contrary. So KDE is condemned to have unhappy people hitting this bug. Sight.

IMO the worst thing about the current behaviour are the wrong messages - instead of telling you that the encoding is wrong, dolphin will tell you it can't find file, even if you are seeing the file in the window (with a name with weird characters, but hey, it's clear it _is_ there, i'm seeing the icon). it confuses users. Maybe it could be possible to try to use the native unix apis when you find such error (as thiago suggest here http://lists.kde.org/?l=kde-core-devel&m=122025063320264&w=2) to implement a simplistic "renamer". It is not portable, but ...
Comment 5 dcg 2009-09-19 16:04:48 UTC
BTW, there's a wonderful GUI tool in the "utf8-migration-tool" package of debian/ubuntu, which will convert to UTF-8 encoding all the misencoded files in your home directory. A good help if you find in a case like mine, with GB of files that dolphin can't handle.
Comment 6 jorortega 2009-12-11 22:49:52 UTC
This tool is good for those that have data in non-UTF8 format in read/write media.

This tool is pretty much useless for read only media (cd,dvd,tape,etc...). And the use of a command line tool pretty much defeat the entire purpose of a GUI.

What is needed really is when KDE find a file with legacy encoding, simply shows the garbage associated, and in a drop down menu in the header of konqueror/dolphin or whatever KDE is using at the moment, the user choose a locale, since the user knows where lives and/or the original language of the file. Can be Japanese, Chinesse, German, etc. The thing is the user should _not_ care about internal locales. We as users just need the damned thing can be opened (Ironically, one of the open source "advantages" was to prevent some "propietary format locking" and/or, prevent that your data were tied to a specific software and cannot be opened in the future... how risible is this considering the actual results). And the system should _NOT_ produce legacy encoding. If you open a read only file, modify it, and save in another site, KDE should save it in UTF8 equivalent transparently (One one knows the origin of the legacy file, convert to UTF8 is trivial, the problem always is determining the origin).

As an ex KDE user (and i moved the entire bussiness to gnome) i just recommend go to Gnome. If you want a good distro with a nice and well known default interface i recommend to all to use Linux Mint, and forget about all this kind of problems.Probably in the closer future, LXDE can be usefull too.
Comment 7 Rubén 2011-02-06 16:42:02 UTC
I use convmv (http://freshmeat.net/projects/convmv/) to repair encoding of filenames, while I wait this to be fixed.
Comment 8 Daniel Mader 2011-03-19 14:55:09 UTC
I just happened to fix hundreds of files on my girlfriends laptop which she received from a colleague. She is a teacher an Windows is still popular there. It is outrageous and incredibly ignorant that this bug is still present after YEARS! It's a superb reason not to adopt the KDE desktop for any business desktop, it's simply unacceptable to have this mess in 2011.
Comment 9 Peter Tselios 2011-06-07 10:57:01 UTC
(In reply to comment #6)
> This tool is good for those that have data in non-UTF8 format in read/write
> media.
> 
> This tool is pretty much useless for read only media (cd,dvd,tape,etc...). And
> the use of a command line tool pretty much defeat the entire purpose of a GUI.
> 
> What is needed really is when KDE find a file with legacy encoding, simply
> shows the garbage associated, and in a drop down menu in the header of
> konqueror/dolphin or whatever KDE is using at the moment, the user choose a
> locale, since the user knows where lives and/or the original language of the
> file. Can be Japanese, Chinesse, German, etc. The thing is the user should
> _not_ care about internal locales. We as users just need the damned thing can
> be opened (Ironically, one of the open source "advantages" was to prevent some
> "propietary format locking" and/or, prevent that your data were tied to a
> specific software and cannot be opened in the future... how risible is this
> considering the actual results). And the system should _NOT_ produce legacy
> encoding. If you open a read only file, modify it, and save in another site,
> KDE should save it in UTF8 equivalent transparently (One one knows the origin
> of the legacy file, convert to UTF8 is trivial, the problem always is
> determining the origin).
> 
> As an ex KDE user (and i moved the entire bussiness to gnome) i just recommend
> go to Gnome. If you want a good distro with a nice and well known default
> interface i recommend to all to use Linux Mint, and forget about all this kind
> of problems.Probably in the closer future, LXDE can be usefull 

I don't know about Gnome, I am not a Gnome user, but I am not so sure that a filesystem or an application can find the encoding of a filename. Is it can, then yes, files should be read in that encoding. But I am afraid it is not a KDE issue. If you open a console, you will see the same problem. I believe it's a FS related issue, but KDE could hide it.
Comment 10 zloty 2011-08-13 12:31:15 UTC
Im Gnome user since years, and when gnome3 arrive i change desktop to KDE4. And cannot use some of my files because encoding in their names is broken, dolphin not see it, and cannot handle it. When i use gtk or console apps all files and directories with broken names are visible, i can handle on it, renaming deleting etc. For change names i use pcmanfm - it's prosthesis, not a solution . I think is problem in kde4 not elsewhere.
Comment 11 Christoph Feck 2012-04-29 12:44:54 UTC
*** Bug 299026 has been marked as a duplicate of this bug. ***
Comment 12 Christoph Feck 2012-05-18 11:50:19 UTC
*** Bug 300237 has been marked as a duplicate of this bug. ***
Comment 13 Christoph Feck 2012-07-02 15:32:43 UTC
*** Bug 260774 has been marked as a duplicate of this bug. ***
Comment 14 Jeroen van Meeuwen (Kolab Systems) 2012-08-24 16:19:38 UTC
Resetting assignee to default as per bug #305719
Comment 15 Christoph Feck 2015-11-12 22:41:47 UTC
*** Bug 355048 has been marked as a duplicate of this bug. ***
Comment 16 Nate Graham 2018-04-17 15:23:43 UTC
*** Bug 173097 has been marked as a duplicate of this bug. ***
Comment 17 Patrick Silva 2018-04-20 11:48:48 UTC
*** Bug 359991 has been marked as a duplicate of this bug. ***
Comment 18 Christoph Feck 2018-08-04 09:39:41 UTC
*** Bug 397119 has been marked as a duplicate of this bug. ***
Comment 19 Christoph Feck 2019-01-10 00:32:33 UTC
Is this still needed? I added code to https://bugreports.qt.io/browse/QTBUG-59402 that could be integrated into Dolphin.
Comment 20 Hans-Peter Jansen 2019-01-10 11:12:05 UTC
Given, that this issue costs "users", I would strongly vote for fixing this issue, rather sooner than later...
Comment 21 Christoph Feck 2019-05-21 11:10:41 UTC
Git commit 6738a8b2f71c527f30a624b0b560f79d992715d3 by Christoph Feck.
Committed on 21/05/2019 at 11:05.
Pushed by cfeck into branch 'master'.

[kioslave/file] Add a codec for legacy filenames

UNIX filenames can contain any bytes (except \0 and /).
Qt's QFile::decodeName() calls QString::fromLocal8Bit(), assuming that all
filesystems use the system's locale encoding. For filenames that have been
created with a different encoding, and have not yet been converted (e.g. using
convmv), this creates non-reversible U+FFFD (REPLACEMENT CHARACTER)
code points in the filenames.

For example, some old-style archives might not contain any information about
the encoding of the filenames, and even today archivers extract them without
trying to convert to the locale's encoding.

While full support for those filenames is not needed, Dolphin should at least
be able to delete, rename, and move those files. Since all actual (local) file
handling is done inside the file kioslave, patching Dolphin will not help.

This code is a near verbatim copy of the code we had in kdelibs, written by
Szókovács Róbert. Only minor adaptions to Qt5 were done. It decodes invalid
bytes as U+10FExx from Plane 16 (Supplementary Private Use Area-B) to be able
to encode them later.

Dolphin could detect filenames with those characters, and either mark them
(by color or overlay icon), or even automatically offer to rename them.
Related: bug 165044

TEST PLAN

touch "/tmp/test-"$'\377'".txt"
dolphin /tmp

Copying and deleting a test file worked with this code, failed without.

Reviewers: dfaure, Frameworks, Dolphin

Reviewed by: dfaure

Differential Revision: https://phabricator.kde.org/D18161

M  +1    -1    src/ioslaves/file/CMakeLists.txt
M  +8    -0    src/ioslaves/file/file.cpp
A  +174  -0    src/ioslaves/file/legacycodec.cpp     [License: LGPL (v2+)]
A  +66   -0    src/ioslaves/file/legacycodec.h     [License: LGPL (v2+)]

https://commits.kde.org/kio/6738a8b2f71c527f30a624b0b560f79d992715d3