Bug 378124

Summary: Character width for HIGH VOLTAGE SIGN possibly wrong
Product: [Applications] konsole Reporter: Thomas Luzat <thomas+kdebugs>
Component: fontAssignee: Konsole Developer <konsole-devel>
Status: RESOLVED DUPLICATE    
Severity: normal CC: egmont, mglb, nate, randall.leeds, thomas+kdebugs, vanush.kamaryan
Priority: NOR    
Version: 18.12.2   
Target Milestone: ---   
Platform: Neon   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=392172
Latest Commit: Version Fixed In: 18.12
Sentry Crash Report:
Attachments: Third-party terminal screenshot to show the symbol/prompt

Description Thomas Luzat 2017-03-26 21:25:20 UTC
Created attachment 104749 [details]
Third-party terminal screenshot to show the symbol/prompt

I am using konsole with Source Code Pro font, (oh-my-)zsh and agnoster theme. This looks similar to attached screenshot, taken from https://gist.github.com/agnoster/3712874 When becoming root the command prompt contains Unicode character \u26a1 (HIGH VOLTAGE SIGN). The character is rendered with a 1 character width. Problems with that character are reproducible with other shells by just copy & pasting the Unicode character into some konsole. Two symptoms of the problem are:

1. In the configuration above: When using tab completion (e.g.: enter ab, press TAB), the prompt doesn't show "ab" but becomes "aab". That is, the completion is inserted with an offset of one character. Trying to backspace 3 times or delete the line with C-u only the "ab" gets deleted.

2. Easier to reproduce: Copy & paste the character \u26a1 into some shell running in konsole. Backspace or C-u make konsole move 2 characters backwards instead of one, deleting parts of the prompt. Cursor movement across the character moves too far.

xterm and urxvt on the same system with the same font show different behavior: They render the symbol as a character which is two cells wide (horizontally centered within that box). All operations (tab completion, character deletion, cursor movement, linewrapping, ...) work as expected.

Debian is using glibc 2.24-9 with Unicode 9.0 EastAsianWidth.txt; this means the glibc wcwidth returns 2 for \u26a1. I do not know if older versions of glibc (<2.24-6) have shown the same behavior. EastAsianWidth.txt of Unicode 8.0 didn't contain \u26a1; it may be that it started when glibc switched to Unicode 9.0 (which it will on all distributions with 2.26).

I patched konsole_wcwidth.cpp to have its wcwidth implementation return 2 for \u26a1. This fixes the behavior, but the symbol is now rendered left-aligned within the two cells it's getting (it looks like lightning plus space character). I do not know if it should be centered or left-aligned, but this may be another issue (would prefer centered).

There seem to be other problems with konsole's wcwidth, cf. https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/ Wouldn't using the system's wcwidth (if available?) be preferable? This might also give more consistent behavior across the system.
Comment 1 Egmont Koblinger 2017-03-28 23:20:56 UTC
gnome-terminal suffers from the same set of problems, see e.g.
  https://bugzilla.gnome.org/show_bug.cgi?id=772812
  https://bugzilla.gnome.org/show_bug.cgi?id=772890

Indeed plenty of codepoints changed from single-wide to double-wide as of Unicode 9.0, and this causes tons of troubles (until all components of the system update to 9.0).

glibc will receive Unicode 9.0 support in version 2.26 (it's already in git, but missed 2.25). Based on what you say, Debian seems to forward-patch their 2.24.

> EastAsianWidth.txt of Unicode 8.0 didn't contain \u26a1

It may not contain this string in particular, it's inside an interval:

ftp://ftp.unicode.org/Public/8.0.0/ucd/EastAsianWidth.txt
26A0..26BD;N     # So    [30] WARNING SIGN..SOCCER BALL

ftp://ftp.unicode.org/Public/9.0.0/ucd/EastAsianWidth.txt
26A1;W           # So         HIGH VOLTAGE SIGN

> Wouldn't using the system's wcwidth (if available?) be preferable?

I guess so (see the second gnome-terminal link above).
Comment 2 Kurt Hindenburg 2018-02-24 18:30:59 UTC
Konsole  uses xterm's wcwidth code - I really wish Qt would incorporate it so everyone could use that.  I'm open to suggestions to avoid all these width issues.
Comment 3 Mariusz Glebocki 2018-03-16 00:10:13 UTC
(In reply to Kurt Hindenburg from comment #2)
> I'm open to suggestions to avoid all these width issues.

First thing Konsole needs is to change internal character representation from UTF16 to UTF32. This will allow to properly handle code points above 0xffff (right now, they are all assumed to be wide and non-combining). QChar::is*() and wcwidth(), even the one implemented in Konsole, already support UTF32 characters. Nice thing is that Character class won't change size - it uses 13 bytes aligned to 16, so after change it will be 15 bytes aligned to 16.

I think glibc's wcwidth() would be nice as a source of character widths:
- Unicode 10 since 2.26 (released on february 2017, available in e.g. Kubuntu 17.10).
- Most terminal applications probably use it, so widths would match.
- Less code to maintain.

Possible disadvantages:
- Qt's QChar::is*() can use another Unicode version, potentially slightly incompatible with glibc's one. Solution: use iswctype() instead.
- Unicode 8 (or older) on systems with older glibc.
- Lack of customization, like selecting Unicode version (e.g. when connecting to remote systems with older glibc), or changing width of ambiguous characters, but there is no such feature right now.

I've already modified Konsole to use UTF32 and glibc's wcwidth(), I just have to clean it up a bit before creating review request.
Comment 4 Kurt Hindenburg 2018-10-03 15:11:28 UTC
Git commit e74cf6c36642247f3f79194da373d01a00645d36 by Kurt Hindenburg, on behalf of Mariusz Glebocki.
Committed on 03/10/2018 at 15:11.
Pushed by hindenburg into branch 'master'.

Use new character width code based on Unicode 11

Summary:
Adds a code for getting character width togeter with LUTs generated
using uni2characterwidth from Unicode 11 lists.

Skin tone, flags, gender, and other emoji with and modifer are not
joined (you will see e.g. a skin tone square + generic yellow emoji).
I think joining them would cause problems in most editors, command line
prompts, and other programs which use character width data, as the
characters would behave as combining or emoji depending on context (like
ligatures).

Examples:
* light thumb up: 👍đŸģ
* dark thumb up:  👍đŸŋ
* Polish flag:    đŸ‡ĩ🇱

This behavior is allowed:
* https://unicode.org/reports/tr51/#Emoji_Modifiers_Display
* https://unicode.org/reports/tr51/#Emoji_ZWJ_Sequences

It is possible to add support for sequences, but those would work
only for a string width functions.

Some characters which can be presented as emoji are narrow (e.g. ✖ī¸, Šī¸).
Those characters are listed without "presentation" mode, which means
they should be rendered as text by default (real presentation depends on
renderer and/or font). Noto Sans Color Emoji renders them as wide,
DejaVu Sans as narrow. Vim, bash and zsh treat them as narrow, so I made
them narrow.

https://unicode.org/reports/tr51/#Presentation_Style
Related: bug 396435, bug 392171, bug 339439

FIXED-IN: 18.12

Depends on D15757

Test Plan:
* Look at emoji_test.txt - emojis should look "normal" (two characters
width).
* Look at GLASS.txt - characters width should look correct.
* CharacterWidthTest should pass.
* perl -XCSDL -e 'print map{chr($_), " "} 1..0xffff'

Reviewers: #konsole, #vdg, hindenburg

Reviewed By: #konsole, hindenburg

Subscribers: hindenburg, broulik, ngraham, konsole-devel

Tags: #konsole

Differential Revision: https://phabricator.kde.org/D15758

D  +0    -64   COPYING.Unicode
M  +1    -1    src/CMakeLists.txt
M  +2    -2    src/Character.h
A  +159  -0    src/CharacterWidth.cpp     [License: GENERATED FILE]  *
A  +8    -0    src/CharacterWidth.h     [License: UNKNOWN]  *
A  +102  -0    src/CharacterWidth.src.cpp     [License: GPL (v2+)]
M  +1    -1    src/Filter.cpp
M  +1    -1    src/TerminalCharacterDecoder.cpp
M  +1    -1    src/TerminalDisplay.cpp
M  +6    -2    src/autotests/CharacterWidthTest.cpp
D  +0    -238  src/konsole_wcwidth.cpp
D  +0    -16   src/konsole_wcwidth.h
A  +3    -0    tools/uni2characterwidth/overrides.txt

The files marked with a * at the end have a non valid license. Please read: https://community.kde.org/Policies/Licensing_Policy and use the headers which are listed at that page.


https://commits.kde.org/konsole/e74cf6c36642247f3f79194da373d01a00645d36
Comment 5 Vanush 2019-02-17 22:46:11 UTC
I still have this issue on KDE Neon 5.15.0 with Konsole 18.12.2
Comment 6 Mariusz Glebocki 2019-02-19 15:11:46 UTC
@Vanush: can you provide some example how to reproduce your problem?
Comment 7 Vanush 2019-02-19 15:29:52 UTC
install zsh and awesome-fontconfig
here is link to issue I created, I got a response that it will be solved in 19.04
https://bugs.kde.org/show_bug.cgi?id=404525
Comment 8 Nate Graham 2019-02-19 17:25:01 UTC

*** This bug has been marked as a duplicate of bug 401298 ***