Bug 161673 - Cyrillic file names on usb storage devices are displayed as '?????????'
Summary: Cyrillic file names on usb storage devices are displayed as '?????????'
Status: RESOLVED FIXED
Alias: None
Product: kdelibs
Classification: Unclassified
Component: general (show other bugs)
Version: 4.0
Platform: Ubuntu Packages Linux
: HI normal with 60 votes (vote)
Target Milestone: ---
Assignee: kdelibs bugs
URL:
Keywords:
: 133444 161588 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-05-05 22:16 UTC by Eldar Insafutdinov
Modified: 2009-02-28 17:35 UTC (History)
12 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
test file (37 bytes, text/plain)
2008-07-17 10:25 UTC, Kristjan Ugrin
Details
fix mount for BSD (1.03 KB, patch)
2008-10-05 21:41 UTC, Max
Details
fix mount for BSD again (3.75 KB, patch)
2008-10-11 23:00 UTC, Max
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eldar Insafutdinov 2008-05-05 22:16:38 UTC
Version:            (using KDE 4.0.3)
Installed from:    Ubuntu Packages
OS:                Linux

If I plug an usb hard-drive and first time  open it with dolphin-kde4 - I have this problems - file and folder names in cyrillic are displayed like '????????'. Then if I close dolphin-kde4 and open it with dolphin from kde3 - '???????' remain the same.
But if I unplug device, plug it once again and open it for the first time with dolphin from kde3 - everything is okey, I can launch dolphin-kde4 then and it also shows russian filenames correctly. So the problem is I think when KDE4 mounts the device, because in KDE 3 I don't have such a problem. I am not the only one, who troubles about this issue among the Russian community. It would be very nice of you to solve this problem, because I think it's the only one, that keeps me away from KDE 4 :(
Comment 1 Pehota 2008-05-08 08:52:36 UTC
I confirm
Comment 2 Pehota 2008-05-08 08:53:58 UTC
using Ubuntu 8.04+KDE 4.0.73
Comment 3 Nikolay 2008-05-11 11:39:39 UTC
Confirm
Comment 4 earfin 2008-05-11 11:46:41 UTC
confirm 
Comment 5 Eldar Insafutdinov 2008-07-17 09:24:53 UTC
AAAAA - it's alnmost the release of KDE 4.1, downloaded yesterday packages of 4.0.98 RC1 for ubuntu 8.04 - bug is still not fixed :(
Comment 6 Kristjan Ugrin 2008-07-17 10:25:20 UTC
Created attachment 26195 [details]
test file

Steps to reproduce:
1. download test file
2. mount external usb key (vfat) and try to copy file to this device
3. (it should fail)
4. eject usb key
5. plug it in again
6. try to copy to it again, it should work the second time
Comment 7 Kristjan Ugrin 2008-07-17 10:29:23 UTC
Because the attachment gets b0rked here, use this link (right-click on download -> save file as):
http://files.myopera.com/kriko/files/Эльдар%20Инсафутдинов
Comment 8 George 2008-07-17 10:45:15 UTC
mount -o codepage=855,iocharset=utf8,uid=linux,gid=users /dev/sdb1 /mnt/ - works

in kernel parameters i have:
linux:/home/linux # grep "FAT_DEFAULT" /boot/config
CONFIG_FAT_DEFAULT_CODEPAGE=855
CONFIG_FAT_DEFAULT_IOCHARSET="utf8"

so files with cyrillic letters on fat32 devices are displayed without problems
you can recompile kernel with this parameters
Comment 9 Eldar Insafutdinov 2008-07-30 09:50:51 UTC
Why recompile kernel if both KDE3.5 and GNOME mount correctly on the same kernel?
Comment 10 Dario Andres 2008-08-26 14:17:38 UTC
This seems to be relate to bug 165400 / bug 165044
Comment 11 Nick Shaforostoff 2008-09-07 16:01:29 UTC
fixed in kde 4.1 and 4.x

(it seems that automatic bug closing still haven't been restored)
Comment 12 Kevin Kofler 2008-09-08 17:55:59 UTC
There are some things I don't understand in that patch:
* Why is this using 12xx codepage numbers? 12xx codepages are Window$ codepages, but FAT uses DO$ codepages (which have 8xx numbers or 437) for short file names and UTF-16 for long file names.
* Why is this assigning Western European languages to CP 1255? That's the Hebrew codepage. The correct Window$ codepage for Western Europe is CP 1252. Typo? (And shouldn't this use 850 or 437 instead?)
* Why is iocharset=utf8 getting removed for English (and other unlisted languages)? Even if the default codepage= setting is OK for English, the iocharset should still be UTF-8!
Comment 13 Nick Shaforostoff 2008-09-08 18:21:57 UTC
1. because it works on my system (all Russian blog posts on this topic use cp1251).
2. thank you for noticing this! i've just commited the fix.
3. i did that to not change behaviour for english environments (English users don't complain, so i don't touch what already works - simple safety rule).
Comment 14 Kevin Kofler 2008-09-08 18:39:23 UTC
1. Yet http://bugs.kde.org/show_bug.cgi?id=161673#c8 says 855.
2. OK, but see above.
3. But it's inconsistent and if iocharset=utf8 is needed for some locales, it's needed for all, there can be accents in English file names too. That said, Fedora patches the kernel so UTF-8 is the default anyway, not sure what other distros do.
Comment 15 Kevin Kofler 2008-09-12 22:16:23 UTC
I just tested it with one of my FAT32 partitions (created with Window$ Me), on Fedora 8 with the distribution kernel (2.6.25.14-69.fc8). If I pass codepage=1252 as an option, I get:
Unable to load NLS charset cp1252
FAT: codepage cp1252 not found
so cp1252 isn't even supported by the kernel! So your patch BREAKS automounting of VFAT partitions COMPLETELY at least on Fedora 8. If on the other hand, I use codepage=850 or codepage=437, it works, and both my personal experience and Wikipedia confirm that 8xx or 437 are the correct codepages to use. (437 is the old DO$ codepage for US and Western Europe, 850 the new Western European DO$ codepage (not sure about the US), the other 8xx ones are the other DO$ codepages, e.g. 855 for Cyrillic.)
Comment 16 Nick Shaforostoff 2008-09-12 22:19:08 UTC
thank you for test-based reasoning. i will commit changes in a moment.
Comment 17 Kevin Kofler 2008-09-12 22:21:26 UTC
We probably want 850 for the Western European languages.
Comment 18 Pavel Zheltobryukhov 2008-09-19 15:42:37 UTC
*** Bug 170636 has been marked as a duplicate of this bug. ***
Comment 19 Max 2008-09-20 10:56:46 UTC
The way how this bug is going to be fixed (I mean latest changes in trunk) is noop for non-linux users. For expamle, freebsd mount doesn't have iocharset and codepage options (and ever uid --- you should use -u instead), but there are -L and -D options for mounting fat volumes [1] and -C for cd, ntfs, udf [2].   
May be it would be better to allow users to specify mount options, if they are not satisfied with default options?

[1] http://www.freebsd.org/cgi/man.cgi?query=mount_msdosfs
[2] http://www.freebsd.org/cgi/man.cgi?query=mount_cd9660
http://www.freebsd.org/cgi/man.cgi?query=mount_ntfs
http://www.freebsd.org/cgi/man.cgi?query=mount_udf
Comment 20 Nick Shaforostoff 2008-09-20 12:52:25 UTC
holy moly!

i understand that you BSD guys love to do everything manually, but no, with KDE you will be forced to see machine doing work for you ;)

i'll add check for these options a bit later today
Comment 21 Danila Sentiabov 2008-09-21 20:41:07 UTC
I'm not a BSD guy, but I also think that there should be a way to specify mount options manually. If you don't want to make GUI cluttered with options, it definitely won't hurt to provide a way to specify options in some config file.

Imagine that you have some FAT removable drives in cp 866 and some in cp 850. How will KDE you know which codepage to use in each case?

Another example. I, personally, never use anything except for en_US.UTF-8 in my locale settings. But all my FAT removable media is in 866 and I expect to see it working properly.

But it's more than that. What if I'm not happy with other mount options in their default state? For example, I always use "shortname=mixed" option for FAT media. With KDE 3.5 I can simply set this option through GUI or edit mediamanagerrc manually (which is much more painful considering all that multi-character media UIDs and absence of a method to modify mount defaults). With KDE 4.x I'll have to recompile my kernel to make up for "work" that my machine "doing for me" without my request? :-)

I don't mean to produce a flame or make some ignorant complaints here, really. You are doing a great job, and I'm sure that everyone is appreciating it. But it's sad that KDE now lacks configurability (at least in some areas) which always was it's main advantage over it's competitors. The main reason I sticked with KDE instead of Gnome some years ago was Gnome's "simple approach" (I'd say "dumb" or even "idiotic"). Now it seems that "simplification" is plaguing everything, doing more harm than good :-)

Summarizing my thoughts. I think that "doing all the chores" without user intervention is good. But there should be a way to stop the machine manually if your shirt got caught in gear wheels :-)

Sorry for long comment.
Comment 22 Nick Shaforostoff 2008-09-26 11:06:42 UTC
for mounting drives with different codepages you can still use usual mount.
in future I might add autodetection (as in browsers), but this is low-priority for now. you can give us your helping hand.
Comment 23 Danila Sentiabov 2008-09-26 11:36:51 UTC
(In reply to comment #22)
> for mounting drives with different codepages you can still use usual mount.
Oh, sure, I can do it. I can even create a bunch of .sh files on desktop to mount my media. Welcome back to the 90s with KDE 4! :-)

> in future I might add autodetection (as in browsers), but this is low-priority
> for now.
Even in browsers encoding autodetection don't work well. When it comes to filenames, where amount of available text data is too low to analyze, it'd be much worse.

Today I've found another case which require human intervention and manual mount settings. It's NTFS volumes. Windows users rarely use "Safely remove" option, so most of NTFS volumes that I had in my hands contained a "dirty" journal file requiring the "force" option to mount.

Is it really that hard, or that awfully wrong to add an option to set mount options manually, maybe in some hidden configuration file?

> you can give us your helping hand.
I'd be glad to help, but I barely know C++ and all I've ever coded were WinAPI/MSVC applications. I fear that I'd be useless as code contributor.
Comment 24 Nick Shaforostoff 2008-09-26 12:55:41 UTC
I think of this ui:
in context menu of mounted media (e.g. on Places panel of Dolphin) add a menu option 'remount with custom options'

as I said, we have another issues to work on, for example sonnet spellcheck doesn't work for cyrillic texts
Comment 25 Kevin Kofler 2008-09-26 15:55:27 UTC
Even if you use a spellchecker which uses UTF-8 throughout, like Hunspell? (It's OT in this report though, if you have a bug ID for that bug, we can discuss it there.)
Comment 26 Danila Sentiabov 2008-09-26 16:03:33 UTC
(In reply to comment #24)
> I think of this ui:
> in context menu of mounted media (e.g. on Places panel of Dolphin) add a menu
> option 'remount with custom options'
Combined with a checkbox "remember these options for this media" it would'be a God's blessing for me and many other users! :-)
Comment 27 Grissiom 2008-09-27 03:42:50 UTC
*** This bug has been confirmed by popular vote. ***
Comment 28 Grissiom 2008-09-27 03:45:18 UTC
The problem is still NOT fix in my box. KDE4.1.1 Slackware packages. The codepage of the disk should be cp936 and all the non-English characters get garbled.
Comment 29 Nick Shaforostoff 2008-09-27 10:10:47 UTC
of course. use 4.1.2 that is due to be released in a few days.
Comment 30 Grissiom 2008-09-27 10:16:09 UTC
(In reply to comment #29)
> of course. use 4.1.2 that is due to be released in a few days.
> 

Oh,,, thanks really. I'm very looking forward to use 4.1.2 ;)
Comment 31 Kevin Kofler 2008-09-27 10:38:31 UTC
I'm not convinced the change from CP 850 to CP 437 (by mitchell, CCed) was a good idea:

<< Use CP 437 instead of 850.  From Wikipedia:

"Code page 850 is a code page that was used in western Europe, under systems such as DOS. It was also sometimes used on English DOS systems although 
CP437 was generally the default on those. It was largely replaced first with windows-1252 (often mislabeled as ISO-8859-1) and later by UCS-2 and 
finally UTF-16 (while the NT line was natively unicode from the start issues of development tool support and compatibility with windows 9x kept most 
applications on the 8 bit code pages). According to Microsoft, it is obsolete and unsupported." >>

Relying on Wikipedia as a primary source of information is a bad idea, especially with a such unsourced statement. (I just added the {{fact}}, i.e. "[citation needed]", template.)

The history as I recall it (and I do have experience with old M$ systems) is that CP 437 was for all effects replaced with CP 850 for all of Western Europe. This caused a lot of problems and so often it was manually reconfigured to 437. However, you have to consider what these problems were: they were problems with displaying some line drawing characters, which have been replaced with accented characters from ISO 8859-1 (but in different positions). Now what is more likely to be used in file names: line drawing characters or accented characters? So I expect there to be less problems with using 850 for file systems created with 437 than the opposite. (That said, CP 437 also has Greek characters which are not in 850, so that may be a problem.)

<< In my own experience accessing a device shared via Samba from both Linux and Windows systems, I need to set the Linux system to cp437, not 850. >>

Have you actually tried 850? Chances are it doesn't make any difference, unless you're using Greek letters or line-drawing characters in your file names. It does make a difference for those European users using the additional accented characters in CP 850.

<< And using "obsolete and unsupported" codepages is not generally a good idea. >>

Of course CP 850 is obsolete, but so is CP 437 and all the other "OEM codepages".

That said, there is indeed a newer replacement for 850, but it's not 437, but 858, which is 850 with the Euro sign instead of the dotless i.

<< That being said, it might be better to find out why 1252 didn't work, since it should.  Then again, I believe I had tried 1252 before eventually 
getting to cp437 and had issues with it too. >>

And this shows that you don't have enough experience with M$ codepages to make such a change. CP 1252 obviously doesn't work because it's a Window$ codepage, not an "OEM codepage" (i.e. DO$ codepage) and FAT uses "OEM codepages".
Comment 32 Kevin Kofler 2008-09-27 10:40:40 UTC
(CP 1252 was designed to be compatible with (i.e. to use the same codepoints as) ISO 8859-1, not with the legacy OEM codepages.)
Comment 33 Nick Shaforostoff 2008-09-27 16:30:47 UTC
please test kde 4.1.2 and report if you're not satisfied with current situation.
Comment 34 Jeff Mitchell 2008-09-27 16:37:10 UTC
Could you $top u$ing dollar $igns everywhere?  It looks $upremely $tupid.
Comment 35 Kevin Kofler 2008-09-27 18:59:05 UTC
I consider this part of freedom of expression.
Comment 36 Jeff Mitchell 2008-09-27 19:13:32 UTC
You're right.  Here's my freedom of expression:  It makes you look like a moron.

Also, it makes what you are saying much more annoying to read.
Comment 37 Andreas Pakulat 2008-09-28 10:52:25 UTC
Could you two please discuss that via private mail? Such mails in a public bug reporting website doesn't gain any of you two anything, except a bad public image. Which also might fall back to KDE in general, because this is KDE's bugreporting website. We're not in kindergarten here.
Comment 38 Kevin Kofler 2008-09-28 12:16:32 UTC
*** Bug 161588 has been marked as a duplicate of this bug. ***
Comment 39 Pavel Zheltobryukhov 2008-09-28 18:24:02 UTC
*** Bug 133444 has been marked as a duplicate of this bug. ***
Comment 40 Manuel Schmid 2008-10-04 18:24:31 UTC
I don't know if this is really fixed, as I'm still experiencing problems when (auto-)mounting USB-drives with KDE 4.1.2 (Kubuntu Hardy packages). When I mount the drives with Nautilus, the filenames are correct.
Comment 41 Nick Shaforostoff 2008-10-04 18:49:34 UTC
http://bugs.kde.org/show_bug.cgi?id=161588#c18

please tell us what's your locale (echo $LANG $LOCALE $LANGUAGE)
Comment 42 Pavel Zheltobryukhov 2008-10-04 19:11:48 UTC
I've got a KDE 4.1.2 on my OpenSUSE 11.0 . And I see that my WD My Book mounted as

/dev/sdb1 on /media/MY BOOK type vfat (rw,nosuid,nodev,uid=1001,codepage=866,iocharset=utf8)

But KDE3 mount options is (was?) (see bug #170636 closed by myself)

/dev/sdb1 on /media/disk type vfat (rw,nosuid,nodev,noatime,flush,uid=1001,utf8,shortname=lower)

There is no any 'codepage' options there, and all works fine

Moreover, I've got a warning in dmesg output

FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!

Did we get a new crutch? Does the removeable media in KDE4 can be mount 'as-it-doing-in-KDE3'?

Comment 43 Nick Shaforostoff 2008-10-04 19:42:11 UTC
yes, specifying utf8 works. i'll change the code in a moment
Comment 44 Nick Shaforostoff 2008-10-04 19:43:41 UTC
what i'd like to know is whether same applies to BSD systems
Comment 45 Pavel Zheltobryukhov 2008-10-04 20:15:44 UTC
In BSD systems case we still need a feature to specify a mount options manually!

Don't take me as pain in the neck - I use FreeBSD at work and Linux at home, and think that 'mount options' can be a reasonable solution.

Comment 46 Max 2008-10-04 20:53:53 UTC
(In reply to comment #44)
> what i'd like to know is whether same applies to BSD systems
> 
mounting fat volumes in 4.1.2 works as expected with utf-8 locales. however the problem still persist for cd/dvd drives and ntfs volumes.

Again, it would be nice to have possibility to set default options (global and per drive) for mounting. Many users will be grateful for that.
Thanks in advance from one of them :)
Comment 47 Nick Shaforostoff 2008-10-04 21:28:56 UTC
do you use ntfs or ntfs-3g?
Comment 48 Max 2008-10-05 17:00:01 UTC
(In reply to comment #47)
> do you use ntfs or ntfs-3g?
Both. Mostly ntfs, cause I rarely need write access
Comment 49 Pino Toscano 2008-10-05 18:35:46 UTC
@Max:
please do not mass-remove CC'ed people, thanks.
Comment 50 Nick Shaforostoff 2008-10-05 19:56:55 UTC
Max: use ntfs-3g
Comment 51 Max 2008-10-05 21:41:25 UTC
Created attachment 27705 [details]
fix mount for BSD

bsd specific:
use '-u' instead of uid
add charset conversion for ntfs, udf, iso9660 filesystems
Comment 52 Max 2008-10-07 11:50:39 UTC
btw, this bug is marked as resolved/fixed. Should I open new one for mounting cd/dvd on bsd, or we can continue here?
Comment 53 Nick Shaforostoff 2008-10-07 14:39:47 UTC
but you've already submitted the patch.
and i applied it: http://websvn.kde.org/branches/KDE/4.1/kdelibs/solid/solid/backends/hal/halstorageaccess.cpp?view=markup
Comment 54 Max 2008-10-11 23:00:01 UTC
Created attachment 27819 [details]
fix mount for BSD again

Nick, I have sent you private mail with new patch, but still have no response from you. So I'm posting here the patch.
This patch removes dos codepage support for freebsd (as you have done for others in r868600) and enables charset conversion for non-default locales only.
Comment 55 Nick Shaforostoff 2008-10-12 07:10:06 UTC
thanks. i'm not always available for this, but be sure your work will go to kde 4.1.3 
Comment 56 Nick Shaforostoff 2008-10-12 18:26:35 UTC
commited to branch and trunk, please test
Comment 57 Danila Sentiabov 2008-11-30 12:42:51 UTC
I'm sorry for bringing this topic up, but 4.2 is near.

As far as I can see in 4.2 Beta 1, there is still no way to set mount options for each removable device manually (as it was in KDE 3.5 or it is currently in Gnome).

There is no way to set mount options via HAL either, because this functionality was deprecated long ago (http://bugs.kde.org/show_bug.cgi?id=133456#c9).
So, there is no way to customize mount options for removable devices at all. And there are not so few cases where it's needed.

Will something change in 4.2 release?
Comment 58 Manuel Schmid 2009-01-13 19:20:46 UTC
For me this bug is still not fixed in KDE 4.2 Beta2 (Kubuntu Intrepid packages). If I plug in my USB SD card reader, all file names are lower case, while they are upper case if they are mounted with Nautilus.
Comment 59 Nick Shaforostoff 2009-02-14 19:56:39 UTC
@Manuel Schmid: does http://websvn.kde.org/?view=rev&revision=918776 fix your problem?
Comment 60 Danila Sentiabov 2009-02-14 23:15:14 UTC
(In reply to comment #24)
> I think of this ui:
> in context menu of mounted media (e.g. on Places panel of Dolphin) add a menu
> option 'remount with custom options'
Sorry to bring this topic up, but 4.2 is out with no way to customize mount options.
Any chances to see this functionality in 4.3? Or there are no plans to do it at all?
Comment 61 Nick Shaforostoff 2009-02-14 23:30:36 UTC
this may be considered as a junior fob. try to do it yourself.
Comment 62 Manuel Schmid 2009-02-15 20:09:39 UTC
(In reply to comment #59)
> @Manuel Schmid: does http://websvn.kde.org/?view=rev&revision=918776 fix your
> problem?
> 

I don’t know yet, as I only have the 4.2 release packages and don’t want to switch to unstable packages on my primary machine. It looks as if this might solve my problem though.
Comment 63 Andrey Borzenkov 2009-02-28 10:15:52 UTC
(In reply to comment #42)
> I've got a KDE 4.1.2 on my OpenSUSE 11.0 . And I see that my WD My Book mounted
> as
> 
> /dev/sdb1 on /media/MY BOOK type vfat
> (rw,nosuid,nodev,uid=1001,codepage=866,iocharset=utf8)
> 
> But KDE3 mount options is (was?) (see bug #170636 closed by myself)
> 

And what makes you believe KDE3 got it right? Bug 133456 suggests quite the contrary.

> /dev/sdb1 on /media/disk type vfat
> (rw,nosuid,nodev,noatime,flush,uid=1001,utf8,shortname=lower)
> 
> There is no any 'codepage' options there, and all works fine
> 

What "all" exactly? Can you compare two file names which differ only in upper/lower case as identical? And that for *every* character in Cyrillic alphabet? Can you work with short 8.3 Cyrillic names on pure FAT (i.e. no UNICODE names)? And all that with defaults in kernel (codepage=437, iocharset=iso8859-1)?

> Moreover, I've got a warning in dmesg output
> 
> FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will
> be case sensitive!
> 

What makes you believe that not seeing warning equals to "problem has been fixed"? Using utf8 flag is actually even more broken than using iocharset=utf8. In the latter you just get case-sensitive comparison. In the former, files names are compared using WHATEVER charset HAPPENS TO BE RIGHT NOW. I.o. you most likely compare utf8 names using (default) iso8859-1 NLS table. 

> Did we get a new crutch? Does the removeable media in KDE4 can be mount
> 'as-it-doing-in-KDE3'?

I.e. let's leave it as broken as it was in KDE3?
Comment 64 Pavel Zheltobryukhov 2009-02-28 17:35:23 UTC
(In reply to comment #63)

> What "all" exactly? 

"all" is mean that is "work for me". If you want to discover this problem deeply - you can do it.