Bug 487594 - KWrite automatic encoding detection error
Summary: KWrite automatic encoding detection error
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: encoding (other bugs)
Version First Reported In: 24.02.2
Platform: Fedora RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-26 16:04 UTC by Red
Modified: 2024-08-08 17:48 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Incorrect display of GBK encoding (125.20 KB, image/png)
2024-05-26 16:04 UTC, Red
Details
Test File (26 bytes, application/octet-stream)
2024-05-26 16:21 UTC, Red
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Red 2024-05-26 16:04:05 UTC
Created attachment 169853 [details]
Incorrect display of GBK encoding

SUMMARY
KWrite successfully identified the text as being encoded in GBK, but it did not actually use the GBK encoding. I have to manually select GBK from the menu in order to display the text correctly. It shows GBK in the bottom right corner, but falls back to a different character encoding.

STEPS TO REPRODUCE
1.Create a file encoded in GBK or  Shift_JIS
2.Open the file

EXPECTED RESULT
Display GBK text correctly. I cannot reproduce this issue on Kubuntu with Plasma 5.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:  Fedora 40
KDE Plasma Version:  6.0.5
KDE Frameworks Version: 6.2
Qt Version: 6.7
Comment 1 Red 2024-05-26 16:21:50 UTC
Created attachment 169856 [details]
Test File
Comment 2 Christoph Cullmann 2024-06-15 17:31:07 UTC
Hmm, can't get that to happen with Kate or KWrite and Frameworks 6.3.
Opens with proper encoding, if fallback is correct, else for me opens as latin15, which is not correct but correctly shown in the status bar.
Comment 3 Bug Janitor Service 2024-06-30 03:47:17 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 4 Red 2024-06-30 12:35:17 UTC
(In reply to Christoph Cullmann from comment #2)
> Hmm, can't get that to happen with Kate or KWrite and Frameworks 6.3.
> Opens with proper encoding, if fallback is correct, else for me opens as
> latin15, which is not correct but correctly shown in the status bar.

I can still reproduce this issue on Frameworks 6.3, even though it shows GBK in the bottom right corner, KWrite will fallback to the Fallback encoding option. In Kubuntu, it will switch to the detected text encoding.
Comment 5 Christoph Cullmann 2024-08-08 17:46:27 UTC
Git commit 49ceb6ce1f0c602907880ec6bb09ca570b30596f by Christoph Cullmann.
Committed on 08/08/2024 at 17:46.
Pushed by cullmann into branch 'master'.

improve encoding detection

decoderForHtml does UTF-8 fallback, that did kill a lot of the probing

M  +12   -19   src/buffer/katetextloader.h

https://invent.kde.org/frameworks/ktexteditor/-/commit/49ceb6ce1f0c602907880ec6bb09ca570b30596f
Comment 6 Christoph Cullmann 2024-08-08 17:48:54 UTC
You are right, there was an error, not sure why it didn't show up for me first.