Bug 440359 - Warning message when opening a file containing just a UTF-8 BOM
Summary: Warning message when opening a file containing just a UTF-8 BOM
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: part (show other bugs)
Version: 21.04.3
Platform: Other Linux
: NOR minor
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-28 15:20 UTC by Chris Spiegel
Modified: 2021-08-12 08:13 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
A file consisting of a UTF-8 BOM (3 bytes, application/octet-stream)
2021-07-28 15:20 UTC, Chris Spiegel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Spiegel 2021-07-28 15:20:44 UTC
Created attachment 140372 [details]
A file consisting of a UTF-8 BOM

Opening a file which contains only a UTF-8 BOM (i.e. U+FEFF, "Zero width no-break space") causes the following message to appear:

The file /path/to/BOM was opened with UTF-8 encoding but contained invalid characters.
It is set to read-only mode, as saving might destroy its content.
Either reopen the file with the correct encoding chosen or enable the read-write mode again in the tools menu to be able to edit it.

Such a file can be created with:

printf "\xef\xbb\xbf" > BOM

I've also attached the file here.

If there are other characters following the BOM there is no error message.

SOFTWARE/OS VERSIONS
KDE Plasma Version: 5.22.3
KDE Frameworks Version: 5.84.0
Qt Version: Latest tagged 5.15 from KDE's repository
Comment 1 Bug Janitor Service 2021-08-02 11:54:04 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/ktexteditor/-/merge_requests/180
Comment 2 Jan Paul Batrina 2021-08-12 08:13:13 UTC
Git commit 5ce06dacda2f57e03c5d513eba75fadda63505ca by Jan Paul Batrina.
Committed on 02/08/2021 at 11:44.
Pushed by cullmann into branch 'master'.

Do not show encoding error when file only contains BOM

failedToConvertOnce shouldn't be set to true when
the BOM was processed.

After this commit, opening files with the following
hex content should open in the corresponding encodings:
E9		- Latin-1/ISO-8859-15
EF BB BF	- UTF-8
EF BB BF E9 FF	- Latin-1/ISO-8859-15

Then forcing the Latin-1 files above to be opened
as UTF-8 should show the "invalid encoding" error
message properly.
Related: bug 272579

M  +11   -1    src/buffer/katetextloader.h

https://invent.kde.org/frameworks/ktexteditor/commit/5ce06dacda2f57e03c5d513eba75fadda63505ca