Bug 207174

Summary: Kate and KWrite should automatically write proper BOM for textfiles
Product: [Applications] kate Reporter: Shriramana Sharma <samjnaa>
Component: generalAssignee: KWrite Developers <kwrite-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: ehamberg, jowenn
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Shriramana Sharma 2009-09-12 13:55:35 UTC
Version:            (using KDE 4.3.1)
OS:                Linux
Installed from:    Ubuntu Packages

Nowadays there are a lot of multilingual text files going around and hence Kate/KWrite should support them fully. For this, it is important that they write the proper BOM when saving text files. Else interoperability of the files created by Kate/KWrite is affected.

I checked how Kate/KWrite saved Devanagari text in UTF-8 and UTF-16 encodings and there was no BOM. (I verified this using Okteta.)

For any text file that contains non-ASCII characters, Kate/KWrite should write a proper BOM.

I am marking this a bug instead of a wishlist because the current behaviour causes interoperability problems.
Comment 1 Milian Wolff 2009-09-12 14:37:45 UTC
Please don't. Or if that really is required for some people, make it optional and off-by-default.

At least PHP and I bet other languages or file-types can easily choke on BOMs. If kate would always insert BOMs, it would get useless for me.

Also I think to remember that at least UTF-8 BOMs are optional and not required. Hence this is not a bug.
Comment 2 Erlend Hamberg 2009-09-12 14:42:17 UTC
milian: really‽ even notepad prepends a BOM on unicode files.
Comment 3 Joseph Wenninger 2009-09-12 15:45:52 UTC
Automatically and always writing BOM markers suck, especially with template languages. eg with django, ZOPE, ....

I got bitten often enough by stupid windows editors just inserting a BOM.
If we add the feature is has to be easily enabled/disabled.

As far as I know, BOMs are not required for being unicode compliant
Comment 4 Joseph Wenninger 2009-09-12 15:56:24 UTC
Addition, it appears for UTF-8 the BOM is optional, for UTF-16 and UTF-32 it is required
Comment 5 Shriramana Sharma 2009-09-12 16:02:51 UTC
Then it means that Kate does not have Unicode compliance for UTF-16 since I tried with UTF-16 and Kate did not place the BOM. Same true for KWrite.

So the request is now for an option to prepend a BOM. Some more points:

1. This option should be added not in the Configure Kate or Configure Editor menu dialog, but in the Save As window.

2. This option should be set to off in the default distribution.

3. It must be decided whether Kate/KWrite should remember my last choice or revert always to the default state of off and make the user manually select it every time they want it. Maybe a choice is to remember the on state only for a file which was recently (in the current session) saved with BOM turned on. 

4. When a file with a BOM is loaded, the option should automatically be turned on. Care must be taken that BOMs are not duplicated at the head of the file, however.

5. If merely "Save" is done and not "Save As", then a BOM should be prepended if it existed when the file was loaded.
Comment 6 Joseph Wenninger 2009-09-12 16:11:08 UTC
Adding to the file dialog is not that good, especially for applications embedding the kate part.

I think the easiest/best way would be:
*) Detecting the existence of a BOM when opening an UTF-8/16 file and keeping that setting.
*) An option for the default for new files (BOM on/off) in the config dialog (Open/Save section) (defaulting to off as system default)
*) An option in the tools menu, just like the "End of line" option in the tools menu.
Comment 7 Dominik Haumann 2009-09-12 16:39:31 UTC
Another idea is to add a document variable (modeline) for that, e.g. prevent-bom=true/false. This could be in the moderc for some files, e.g. php and others. And in theory, it could even be in the highlighting information.
Comment 8 Joseph Wenninger 2009-09-12 17:26:33 UTC
I'm working on it
Comment 9 Joseph Wenninger 2009-09-12 18:20:13 UTC
Okay, it's not committed yet, but the result will be:
*) If the user enabled/disabled the byte order marker explicitly in the tools menu, this setting is honoured, otherwise
*) If bom or byte-order-marker is set in the file mode config line, the boolean value will be used for saving. (This variable is ignored if it is within the document, since before saving the mode overwrites local settings as it appears)
*) If the variable is not specified, if there was a byte order marker at load time, the byte order marker is kept.
*) For new files, if the filetype used for saving has the variable not set and the user didn't explicitly set the option in the menu, the default set in the main open/save configuration is used.

Does that sound reasonable ?
Comment 10 Shriramana Sharma 2009-09-12 18:55:41 UTC
Hello. I'm pleasantly surprised to see the speed at which this bug is being fixed. Is there a record for the fastest bugfix? This is certainly the fastest *I* have seen. 

Anyway, can anyone give me a brief idea about this modeline thing? I have read: http://kate-editor.org/article/katepart_modelines but perhaps it's just me but I didn't get what exactly I would have to type into my text file to turn BOM on or off. So can anyone please give me (via direct email, if more appropriate) examples for C/C++ and Python?

Thanks.
Comment 11 Joseph Wenninger 2009-09-12 21:43:16 UTC
I have implemented it, and it appears to work. I'm going to commit it as soon as svn is up again.
BOMs can now be turned on/off for utf-8 and utf-16. I don't see utf-32 in the encoding selection combo.

To force BOMs of for instance for python, although they are generally turned on. You can change in Settings->Open/Save -> Modes & Filetypes the  Variable line for python from:

kate: presave-postdialog python-encoding
to
kate: presave-postdialog python-encoding; bom off

If you want to force Byte order markers to on for a specific file type, although they are generally turned off, this can be done with "bom on". A synonym is "byte-order-marker on/off"
Comment 12 Joseph Wenninger 2009-09-13 11:36:08 UTC
SVN commit 1022826 by jowenn:

*)New RPMSPEC file from Tim Fechtner
*)Configuration option for writing BOM (byte order markers) for UTF-8/16
BUG: 206142
BUG: 207174


 M  +2 -1      data/katepartsimpleui.rc  
 M  +2 -1      data/katepartui.rc  
 M  +4 -1      dialogs/katedialogs.cpp  
 M  +15 -0     dialogs/opensaveconfigwidget.ui  
 M  +74 -11    document/katebuffer.cpp  
 M  +18 -1     document/katedocument.cpp  
 M  +7 -1      document/katedocument.h  
 M  +46 -7     syntax/data/rpmspec.xml  
 M  +25 -1     utils/kateconfig.cpp  
 M  +5 -0      utils/kateconfig.h  
 M  +19 -1     view/kateview.cpp  
 M  +2 -0      view/kateview.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1022826
Comment 13 Dominik Haumann 2009-09-13 13:35:24 UTC
You can also put it into a comment to enable it in a single file. Just add the part "kate: bom on;" somewhere in a comment in the first or last 10 lines in your file. But it will work only from KDE >= 4.4.
Comment 14 Shriramana Sharma 2009-09-13 13:57:12 UTC
<applauds> That's the fastest bugfix I've ever seen guys! Keep it up!