Version: (using KDE 4.3.4) Installed from: Debian testing/unstable Packages if I save file as UTF-8 in kate, it cannot be opened correctly by some windows programs, including notepad, videoplayers. after I opened this file in notepad++ (open source editor for windows), manually specifying its encoding, and saved the text in notepad (not as ANSI, but as UTF-8), it became immediately readable by both notepad/videoplayers and kate (it is still UTF-8 for kate) if I open the second vesion of the file in okteta, I see EF BB EF in the beginning of the file (BOM mark -- see wikipedia). But it is not the only difference actually, so I'll atach both versions of the file. Wish: please make kate encode files in the way that they are readable by windows and linux apps (i.e. make kate produce win-utf.txt instead of kate-utf.txt).
Created attachment 43216 [details] file produced by kate right now
Created attachment 43217 [details] file with exactly same text, but reencoded by notepad note that it is readable by both sides. AND also BOM mark will allow us to always autodetect UTF-8 encoding 100%.
hey Nick! Please try the same in KDE 4.4 or better yet 4.5 - esp. the latter saw quite some improvements in encoding support. Furthermore BOM should never be inserted by default, many scripting languages have problems with them (e.g. PHP) and afaik even some XML parsers.
there is one positive change from 4.3 to 4.4: In 4.4 if I open any of *.txt files and save them under different name, it is equal to binary copy (i.e. everything is preserved). In 4.3 kwrite removes BOM mark. But If I open win-utf8.txt and copy paste it to a new kwrite window, then save it, it creates file equal to kate-utf8.txt. But I would like to have an option to explicitly specify encoding way for files, i.e. windows-friendly or simple(cmd-line friendly). Also it would be cool to have it automatically select windows-friendly for certain types of files: for example .srt files.
Tools -> Add Byte Order Mark (BOM) ?? I don't know if there is a corresponding modeline.
i thought about extending file save dialog. it already allows us to specify encoding, so why not extend it with another option?
what if we add BOM mark to file only if it has .txt extension? this way no xml files will be harmed, and we'll get nice interoperability with osx and win.
Extending the file dialog is not so easy, as the QFileDialog API in Qt5 must support that first somehow. You can add a bom already now by specifiying it in the filetype through a variable: http://docs.kde.org/stable/en/applications/kate/config-variables.html#variable-byte-order-marker Besides that, are you proposing to add a BOM by default to unicode encoded files?
"are you proposing to add a BOM by default to unicode encoded files?" yes, but only for those that get saved with .txt extension (so we get around the mentioned use-case of broken xml processors). that is its purpose, after all. http://en.wikipedia.org/wiki/Byte_order_mark
Wikipedia says: "The Unicode Standard neither requires nor recommends the use of the BOM for UTF-8.[ http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf ] The presence of the UTF-8 BOM may cause interoperability problems with existing software that could otherwise handle UTF-8[...]"
Sorry, per default we won't add BOMs, that only leads to problems.