41744 – konsole lacks the option of showing Unicode characters with ambiguous width as FULLWIDTH or HALFWIDTH

Bug 41744 - konsole lacks the option of showing Unicode characters with ambiguous width as FULLWIDTH or HALFWIDTH

Summary: konsole lacks the option of showing Unicode characters with ambiguous width a...

Status:	RESOLVED UNMAINTAINED

Alias:	None

Product:	konsole
Classification:	Applications
Component:	font (show other bugs)
Version:	1.2
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Konsole Developer

URL:
Keywords:

Duplicates (1):	202619 (view as bug list)
Depends on:
Blocks:

Reported:	2002-04-27 06:03 UTC by Daisuke Kameda
Modified:	2017-02-13 02:58 UTC (History)
CC List:	11 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
described patch (2.34 KB, patch) 2004-04-14 21:50 UTC, Ken Deeter	Details
cut off characters (30.33 KB, image/jpeg) 2009-03-01 20:53 UTC, Jonathan	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Takumi Asaki 2002-04-27 05:52:29 UTC

(*** This bug was imported into bugs.kde.org ***)

Package:           konsole
Version:           1.2 Preview (using KDE 3.0.5 (CVS HEAD >= 20020412))
Severity:          normal
Installed from:    compiled sources
Compiler:          gcc version 2.95.3 20010315 (release)
OS:                Linux (i686) release 2.4.18-0vl3
OS/Compiler notes: 

Some of Japanese FULLWIDTH symbol chars shown as HALFWIDTH.

I put example EUC-JP codec file on
http://www.kde.gr.jp/~asaki/konsole-symbols.txt

These chars are shown FULLWIDTH in kterm(Kanji-terminal).
http://www.kde.gr.jp/~asaki/kterm.png


And konsole shows HALFWIDTH.
http://www.kde.gr.jp/~asaki/konsole.png
But Japanese terminal application treats them as FULLWIDTH.
So screen is broken.


(Submitted via bugs.kde.org)
(Called from KBugReport dialog)

Comment 1 Waldo Bastian 2002-08-06 00:41:08 UTC

I understand the problem but I have no idea how to fix this.

Cheers
Waldo
--=20
bastian@kde.org  |   SuSE Labs KDE Developer  |  bastian@suse.com

Comment 2 Daisuke Kameda 2003-07-30 18:28:22 UTC

 I have noticed konsole_wcwidth_cjk() is implemented according to "Unicode
Standard Annex #11". I think this problem will be solved, if CJK mode is added
to konsole and konsole_wcwidth_cjk() is executed instead of konsole_wcwidth()
when the mode is On. Moreover, I think that it is desirable to determine by
locale whether the mode is On in the default.

P.S.
 There is http://www.cl.cam.ac.uk/~mgk25/ucs/scw-proposal.html as one material
of the solution of this problem.

Comment 3 Ken Deeter 2003-07-31 00:00:02 UTC

Ideally, you would want to determine whether to use cjk mode by somehow 
seeing if the foreground process within the terminal is running in one of the cjk 
locales. (not sure if this is possible) 
 
Doing it simply by the locale in which konsole is running in might not be the 
best solution. What if you want to run something in a non-cjk locale temporarily, 
like by doing 
 
$ LANG="C" some-command 
 
If this command happens to expect that characters get displayed a different 
way, then the problem would still occur, only in reverse. 
 
It seems like a simple switch in one of the menu's that just turns cjk mode on 
and off would be the most flexible way, given that we don't have a mechanism 
for determining what the foreground process' locale is. This way, I would guess 
80% of the affected users would only need to set the setting once, and those 
who run things in different locales from time to time could flip it on and off as 
needed (perhaps even with a keymap). 
 
My question is however, just because we treat the right characters as FULL 
WIDTH, does this mean they will get rendered properly? Doesn't it depend on 
the conversion from the 8-bit locale to unicode that determines which 
characters get rendered by which glyph in the font? 
 
in other words, if we just switch the konsole_wcwidth() call to the cjk version, 
we will still be using the same glyph, but just be giving twice as much space to 
it, which to me doesn't seem like the right solution.

Comment 4 Ken Deeter 2003-11-08 05:07:39 UTC

Some further analysis:

It turns out that in unicode, the characters in question are 'ambiguous' as to whether they are full width or half width. When things are rendered, the thing that determines the on screen width of hte character is actually the font.

If one opens kedit, and enters one of the characters in question,

Comment 5 Ken Deeter 2003-11-08 05:49:39 UTC

On further expirementation, it appears that we actually need to look at the locale to figure out what the right wcwidth() function to call is. Here is why:

In EUC-JP, these ambiguous chars are always full width. As such, if a program like bash is running in EUC-JP, when it see the EUC-JP code for such a char, it will always treat it as full width. If konsole mistakenly treats this as a half width char, then the application (such as bash) and the konsole will get out of sync with regards to things like cursor position.

Here is the simple example to show this:

Start up bash using EUC-JP. Enter a

Comment 6 Ken Deeter 2003-12-16 05:46:04 UTC

woops, in the last comment, in the 3rd paragraph, it should read, Konsole treats this character as HALF width, not FULL width.

Comment 7 Ken Deeter 2004-04-10 00:26:17 UTC

There are a couple ways we can solve this problem:

Solution A) offer a configuration option that forces konsole_wcwidth_cjk() to be used in place of konsole_wcwidth(). Actually the option should perhaps offer three choices.
1. Choose the right wcwidth according to the locale (which means we need mappings from locales to the wcwidth functions)
2. Always use wcwidth()
3. Always use wcwidth_cjk()

90% of the time (or 'most' of the time), a cjk user will just want to it to use wcwidth_cjk(), but every cjk user shouldn't have to go in and change the setting manually, so it should just be detected by the language code in the locale.

Solution B)
Allow Konsole to switch encodings on the fly. Map each encoding to the appropriate wcwidth() function, and use the right wcwidth() function according to the current locale.

This is actually a much more useful approach (and much more m17n friendly approach), though it might take more work. This allows people to test out different locales and such without restarting konsole. We could also make it so that you can specify the encoding in profiles, so for example you can make a profile to run mutt in UTF-8 if you wanted to, but for all your other stuff you still use some legacy encoding.

Comment 8 Ken Deeter 2004-04-14 21:39:00 UTC

Here is a stopgap patch. It allows you to change which of the wcwidth functions are called based on an environment variable. If "KONSOLE_WCWIDTH_CJK" is defined, then it will use the cjk version.

This is far from ideal, but at least it will provide a temporary work around for users that this problem affects

Comment 9 Ken Deeter 2004-04-14 21:50:54 UTC

Created attachment 5646 [details]
described patch

described patch

Comment 10 Stephan Kulow 2004-05-25 09:18:55 UTC

Replaced asataku@osk3.3web.ne.jp with kaminmat@cc.rim.or.jp due to bounces by reporter

Comment 11 Will Stephenson 2007-11-22 21:16:24 UTC

I know nothing about CJK issues but the PATCH tag will make it easier to revisit after 4.0 is out

Comment 12 FiNeX 2008-12-26 23:41:48 UTC

Now we are near 4.2... what's about this issue?

Comment 13 Dario Andres 2008-12-27 21:41:14 UTC

Also, can bug 173588 be related to this ?

Comment 14 Jonathan 2009-03-01 20:50:57 UTC

I noticed that some unicode characters are cut off because they are too wide. I noticed this behavior with the ẞ which is a fairly new character and represents the new big german-ß. So if using this character a part of it is not shown. If other characters follow, the ẞ is shown in full-width but the last character(s) of that line is (are) cut off.

I also tried out various other unicode characters which show the same behaviour, I checked many characters but probably this list is far from complete:
⊥∴※ℜℑℵ⊶≪≫⋂⋃

I will also attach a picture where you can see the cutting off of the character itself and of the end of the line.

Comment 15 Jonathan 2009-03-01 20:53:20 UTC

Created attachment 31725 [details]
cut off characters

Comment 16 Jekyll Wu 2011-10-11 12:15:40 UTC

*** Bug 202619 has been marked as a duplicate of this bug. ***

Comment 17 Kurt Hindenburg 2017-02-13 02:58:35 UTC

If this is still an issue with a recent version, please open a new report.