Bug 144566 - "Force input charset" fundamentally broken
Summary: "Force input charset" fundamentally broken
Alias: None
Product: k3b
Classification: Unclassified
Component: Data Project (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal (vote)
Target Milestone: ---
Assignee: Sebastian Trueg
Depends on:
Reported: 2007-04-23 15:14 UTC by Egmont Koblinger
Modified: 2007-04-30 13:44 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Note You need to log in before you can comment on or make changes to this bug.
Description Egmont Koblinger 2007-04-23 15:14:29 UTC
Version:           1.0.1 (using KDE KDE 3.5.3)
Installed from:    Compiled From Sources
OS:                Linux

The "Force input charset" option in its current form is fundamentally broken. It is just as unusable as easy it might had been to implement it (just pass an extra command line option to mkisofs). But the whole design completely forgets about what the notion "charset" actually means.

Basically there are two possible approaches.

The first is to assume that all the filenames are always encoded in one particular character set that it set to be consistent at least throughout KDE, but even better the whole operating system. In this case the user always sees all the accented filenames correctly everywhere. Hence the current "Force input charset" offers the user that K3B is able to misbehave instead of behaving normally, should he want this. There's absolutely no use in having such an option. Whether the software should work correctly or not is not something the user should be asked about. The software should work correctly, period. Whether the correct behavior requires a "-input-charset" or similar option behind the scenes is not something the user should see, K3B should automatically do it right.

The other possible assumption is that the user might face different encodings. Let's see what happens currently. You select the files to be put on the CD, but you see false accented characters, since the actual encoding is not what K3B (or KDE) uses to display the filename. (And in case you have your system set to UTF-8 you cannot even add Latin1 filenames to your project, K3B claims they are invalid filenames - why? Maybe I'll tell K3B to expect Latin1.) So while choosing the contents of the CD, you have to work with faulty filenames appearing inside the K3B window. And at the beginning of the writing procedure you can choose a charset - but how would you do which the right charset is?

The basic problem is that K3B ignores that the meaning of charset is how to _display_ the byte sequences, hence K3B itself should also use this setting to display the filenames. This way I'd see the filenames right while working with them, and also had a feedback whether I've set the right charset.

So the user should be able to select the character set before or during adding files to the project. File names that were already added to the projects should be kept the same sequence of bytes (so that you won't hit "No such file or directory" errors), but the way they are displayed on the screen should be updated according to the new charset, so the user would see if he has chosen the right charset. Once the user has adjusted the character set and he _sees_ the filenames correctly on the display, the rest is solely K3B's job, it should encode them properly on the image without asking further questions.
Comment 1 Sebastian Trueg 2007-04-25 12:09:58 UTC
The "enforce charset" option was introduced way back when K3b had no charset check yet. So yes, it is probably outdated and pretty useless as such ATM.

However, I will not start implementing support for different filename encodings in K3b. Do you have any idea what a mess that is? And how do you propose that the GUI would handle that? Ask for each file which charset is to be used? Or have an additional charset parameter in K3b although this should be set system wide? Because in the end the proper filename encoding handling is the job of the operating system or more specific the filesystem. Internally K3b (since it is QT/KDE) only uses UTF8 anyway. The problems start when passing the filenames on to mkisofs/genisoimage. This is where the local encoding is used and only that. So on a properly configured system no problems should arise.

I also do not really see why you would have different filename encodings in your system.

Again, the current force charset option is useless and I will probably remove it.
Comment 2 Egmont Koblinger 2007-04-25 14:45:23 UTC
> So on a properly configured system no problems should arise.
I agree.

> I also do not really see why you would have different filename encodings in your system.
Possible reasons include data downloaded from remote server with wget/rsync/..., or a mounted rockridge CD that you've burnt ages ago with legacy filename charsets. However, I agree that it's not k3b's job to fix those situations, rather convmv should be used, the kernel should be able to convert charset of rockridge iso's etc...

> I will probably remove it.
Sounds fine, thanks.
Comment 3 Sebastian Trueg 2007-04-30 13:44:38 UTC
SVN commit 659528 by trueg:

Removed the "force input charset" option from K3b. Now that K3b checks the filename encoding
this option can only be used to break stuff and, thus, is useless.

BUG: 144566 

 M  +0 -14     libk3b/projects/datacd/k3bdatadoc.cpp  
 M  +0 -3      libk3b/projects/datacd/k3bisoimager.cpp  
 M  +0 -10     libk3b/projects/datacd/k3bisooptions.cpp  
 M  +0 -7      libk3b/projects/datacd/k3bisooptions.h  
 M  +12 -40    src/projects/base_k3badvanceddataimagesettings.ui  
 M  +28 -80    src/projects/k3bdataadvancedimagesettingswidget.cpp  
 M  +7 -9      src/projects/k3bdataimagesettingswidget.cpp