(*** This bug was imported into bugs.kde.org ***) Package: konsole Version: 1.2 Preview (using KDE 3.0.5 (CVS HEAD >= 20020412)) Severity: normal Installed from: compiled sources Compiler: gcc version 2.95.3 20010315 (release) OS: Linux (i686) release 2.4.18-0vl3 OS/Compiler notes: Some of Japanese FULLWIDTH symbol chars shown as HALFWIDTH. I put example EUC-JP codec file on http://www.kde.gr.jp/~asaki/konsole-symbols.txt These chars are shown FULLWIDTH in kterm(Kanji-terminal). http://www.kde.gr.jp/~asaki/kterm.png And konsole shows HALFWIDTH. http://www.kde.gr.jp/~asaki/konsole.png But Japanese terminal application treats them as FULLWIDTH. So screen is broken. (Submitted via bugs.kde.org) (Called from KBugReport dialog)
I understand the problem but I have no idea how to fix this. Cheers Waldo --=20 bastian@kde.org | SuSE Labs KDE Developer | bastian@suse.com
I have noticed konsole_wcwidth_cjk() is implemented according to "Unicode Standard Annex #11". I think this problem will be solved, if CJK mode is added to konsole and konsole_wcwidth_cjk() is executed instead of konsole_wcwidth() when the mode is On. Moreover, I think that it is desirable to determine by locale whether the mode is On in the default. P.S. There is http://www.cl.cam.ac.uk/~mgk25/ucs/scw-proposal.html as one material of the solution of this problem.
Ideally, you would want to determine whether to use cjk mode by somehow seeing if the foreground process within the terminal is running in one of the cjk locales. (not sure if this is possible) Doing it simply by the locale in which konsole is running in might not be the best solution. What if you want to run something in a non-cjk locale temporarily, like by doing $ LANG="C" some-command If this command happens to expect that characters get displayed a different way, then the problem would still occur, only in reverse. It seems like a simple switch in one of the menu's that just turns cjk mode on and off would be the most flexible way, given that we don't have a mechanism for determining what the foreground process' locale is. This way, I would guess 80% of the affected users would only need to set the setting once, and those who run things in different locales from time to time could flip it on and off as needed (perhaps even with a keymap). My question is however, just because we treat the right characters as FULL WIDTH, does this mean they will get rendered properly? Doesn't it depend on the conversion from the 8-bit locale to unicode that determines which characters get rendered by which glyph in the font? in other words, if we just switch the konsole_wcwidth() call to the cjk version, we will still be using the same glyph, but just be giving twice as much space to it, which to me doesn't seem like the right solution.
Some further analysis: It turns out that in unicode, the characters in question are 'ambiguous' as to whether they are full width or half width. When things are rendered, the thing that determines the on screen width of hte character is actually the font. If one opens kedit, and enters one of the characters in question,
On further expirementation, it appears that we actually need to look at the locale to figure out what the right wcwidth() function to call is. Here is why: In EUC-JP, these ambiguous chars are always full width. As such, if a program like bash is running in EUC-JP, when it see the EUC-JP code for such a char, it will always treat it as full width. If konsole mistakenly treats this as a half width char, then the application (such as bash) and the konsole will get out of sync with regards to things like cursor position. Here is the simple example to show this: Start up bash using EUC-JP. Enter a
woops, in the last comment, in the 3rd paragraph, it should read, Konsole treats this character as HALF width, not FULL width.
There are a couple ways we can solve this problem: Solution A) offer a configuration option that forces konsole_wcwidth_cjk() to be used in place of konsole_wcwidth(). Actually the option should perhaps offer three choices. 1. Choose the right wcwidth according to the locale (which means we need mappings from locales to the wcwidth functions) 2. Always use wcwidth() 3. Always use wcwidth_cjk() 90% of the time (or 'most' of the time), a cjk user will just want to it to use wcwidth_cjk(), but every cjk user shouldn't have to go in and change the setting manually, so it should just be detected by the language code in the locale. Solution B) Allow Konsole to switch encodings on the fly. Map each encoding to the appropriate wcwidth() function, and use the right wcwidth() function according to the current locale. This is actually a much more useful approach (and much more m17n friendly approach), though it might take more work. This allows people to test out different locales and such without restarting konsole. We could also make it so that you can specify the encoding in profiles, so for example you can make a profile to run mutt in UTF-8 if you wanted to, but for all your other stuff you still use some legacy encoding.
Here is a stopgap patch. It allows you to change which of the wcwidth functions are called based on an environment variable. If "KONSOLE_WCWIDTH_CJK" is defined, then it will use the cjk version. This is far from ideal, but at least it will provide a temporary work around for users that this problem affects
Created attachment 5646 [details] described patch described patch
Replaced asataku@osk3.3web.ne.jp with kaminmat@cc.rim.or.jp due to bounces by reporter
I know nothing about CJK issues but the PATCH tag will make it easier to revisit after 4.0 is out
Now we are near 4.2... what's about this issue?
Also, can bug 173588 be related to this ?
I noticed that some unicode characters are cut off because they are too wide. I noticed this behavior with the ẞ which is a fairly new character and represents the new big german-ß. So if using this character a part of it is not shown. If other characters follow, the ẞ is shown in full-width but the last character(s) of that line is (are) cut off. I also tried out various other unicode characters which show the same behaviour, I checked many characters but probably this list is far from complete: ⊥∴※ℜℑℵ⊶≪≫⋂⋃ I will also attach a picture where you can see the cutting off of the character itself and of the end of the line.
Created attachment 31725 [details] cut off characters
*** Bug 202619 has been marked as a duplicate of this bug. ***
If this is still an issue with a recent version, please open a new report.