Created attachment 145855 [details] Observed result -- some gibberish characters SUMMARY By default the "Auto Detect Unicode" only works for UTF8 files with BOM. More precision needs to be added in the Help for this part, telling people to either manually change encoding or add a BOM. PS: I haven't got the time to do test with UTF16 or UTF32 so I have no idea. STEPS TO REPRODUCE 1. In Options > Regional Settings, make sure "Auto Detect Unicode" is checked. 2. Using files which are UTF8 but without BOM 3. Compare the files OBSERVED RESULT Characters which are outside 7-bit ASCII are displayed incorrectly. Take a look at my attached image (kdiff3-bomless-utf3-observed-result.png) in which every ONE of those characters is displayed as TWO characters which is a sign that UTF8 text files is not detected correctly. EXPECTED RESULT Correct characters are displayed. This will be shown in my other attached image (kdiff3-bomless-utf3-expected-result.png) IF we specifify UTF-8 instead of relying on "Auto Detect Unicode" option. SOFTWARE/OS VERSIONS Windows: Windows 11 (but this is irrelevant, IMO) KDE Frameworks Version: 5.88.0 Qt Version: ADDITIONAL INFORMATION This bug was previously reported in: https://sourceforge.net/p/kdiff3/discussion/197499/thread/78e8dcc2/?limit=25#0a95 and in: https://sourceforge.net/p/kdiff3/bugs/197/
Created attachment 145856 [details] If UTF-8 is manually chosen, no more gibberish character If UTF-8 is manually chosen, no more gibberish character
Git commit fc59f1005f41940ca8b62d152b63b4cdf822a5c3 by Michael Reeves. Committed on 25/01/2022 at 18:38. Pushed by mreeves into branch 'master'. Document "Auto Dectect Unicode". FIXED-IN:1.9.70 M +2 -0 doc/en/index.docbook https://invent.kde.org/sdk/kdiff3/commit/fc59f1005f41940ca8b62d152b63b4cdf822a5c3
Git commit b96f5d7d36bccddea5a1bfa500a0d7436c2dbf1e by Michael Reeves. Committed on 24/01/2022 at 23:51. Pushed by mreeves into branch 'master'. fix: Attempt to autodect non-bom utf-8 This is not fool proof and can't be but its better than not checking at all. Basiclly anything that can be a utf-8 file will be interpruted as such by default if using auto detection. M +15 -1 src/SourceData.cpp M +1 -0 src/SourceData.h https://invent.kde.org/sdk/kdiff3/commit/b96f5d7d36bccddea5a1bfa500a0d7436c2dbf1e
Git commit 5ee349ee95d7e1473f6fdc9edf02d0cdc3213836 by Michael Reeves. Committed on 24/01/2022 at 23:56. Pushed by mreeves into branch '1.9'. fix: Attempt to autodect non-bom utf-8 This is not fool proof and can't be but its better than not checking at all. Basiclly anything that can be a utf-8 file will be interpruted as such by default if using auto detection. (cherry picked from commit b96f5d7d36bccddea5a1bfa500a0d7436c2dbf1e) M +15 -1 src/SourceData.cpp M +1 -0 src/SourceData.h https://invent.kde.org/sdk/kdiff3/commit/5ee349ee95d7e1473f6fdc9edf02d0cdc3213836