Created attachment 104749 [details] Third-party terminal screenshot to show the symbol/prompt I am using konsole with Source Code Pro font, (oh-my-)zsh and agnoster theme. This looks similar to attached screenshot, taken from https://gist.github.com/agnoster/3712874 When becoming root the command prompt contains Unicode character \u26a1 (HIGH VOLTAGE SIGN). The character is rendered with a 1 character width. Problems with that character are reproducible with other shells by just copy & pasting the Unicode character into some konsole. Two symptoms of the problem are: 1. In the configuration above: When using tab completion (e.g.: enter ab, press TAB), the prompt doesn't show "ab" but becomes "aab". That is, the completion is inserted with an offset of one character. Trying to backspace 3 times or delete the line with C-u only the "ab" gets deleted. 2. Easier to reproduce: Copy & paste the character \u26a1 into some shell running in konsole. Backspace or C-u make konsole move 2 characters backwards instead of one, deleting parts of the prompt. Cursor movement across the character moves too far. xterm and urxvt on the same system with the same font show different behavior: They render the symbol as a character which is two cells wide (horizontally centered within that box). All operations (tab completion, character deletion, cursor movement, linewrapping, ...) work as expected. Debian is using glibc 2.24-9 with Unicode 9.0 EastAsianWidth.txt; this means the glibc wcwidth returns 2 for \u26a1. I do not know if older versions of glibc (<2.24-6) have shown the same behavior. EastAsianWidth.txt of Unicode 8.0 didn't contain \u26a1; it may be that it started when glibc switched to Unicode 9.0 (which it will on all distributions with 2.26). I patched konsole_wcwidth.cpp to have its wcwidth implementation return 2 for \u26a1. This fixes the behavior, but the symbol is now rendered left-aligned within the two cells it's getting (it looks like lightning plus space character). I do not know if it should be centered or left-aligned, but this may be another issue (would prefer centered). There seem to be other problems with konsole's wcwidth, cf. https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/ Wouldn't using the system's wcwidth (if available?) be preferable? This might also give more consistent behavior across the system.
gnome-terminal suffers from the same set of problems, see e.g. https://bugzilla.gnome.org/show_bug.cgi?id=772812 https://bugzilla.gnome.org/show_bug.cgi?id=772890 Indeed plenty of codepoints changed from single-wide to double-wide as of Unicode 9.0, and this causes tons of troubles (until all components of the system update to 9.0). glibc will receive Unicode 9.0 support in version 2.26 (it's already in git, but missed 2.25). Based on what you say, Debian seems to forward-patch their 2.24. > EastAsianWidth.txt of Unicode 8.0 didn't contain \u26a1 It may not contain this string in particular, it's inside an interval: ftp://ftp.unicode.org/Public/8.0.0/ucd/EastAsianWidth.txt 26A0..26BD;N # So [30] WARNING SIGN..SOCCER BALL ftp://ftp.unicode.org/Public/9.0.0/ucd/EastAsianWidth.txt 26A1;W # So HIGH VOLTAGE SIGN > Wouldn't using the system's wcwidth (if available?) be preferable? I guess so (see the second gnome-terminal link above).
Konsole uses xterm's wcwidth code - I really wish Qt would incorporate it so everyone could use that. I'm open to suggestions to avoid all these width issues.
(In reply to Kurt Hindenburg from comment #2) > I'm open to suggestions to avoid all these width issues. First thing Konsole needs is to change internal character representation from UTF16 to UTF32. This will allow to properly handle code points above 0xffff (right now, they are all assumed to be wide and non-combining). QChar::is*() and wcwidth(), even the one implemented in Konsole, already support UTF32 characters. Nice thing is that Character class won't change size - it uses 13 bytes aligned to 16, so after change it will be 15 bytes aligned to 16. I think glibc's wcwidth() would be nice as a source of character widths: - Unicode 10 since 2.26 (released on february 2017, available in e.g. Kubuntu 17.10). - Most terminal applications probably use it, so widths would match. - Less code to maintain. Possible disadvantages: - Qt's QChar::is*() can use another Unicode version, potentially slightly incompatible with glibc's one. Solution: use iswctype() instead. - Unicode 8 (or older) on systems with older glibc. - Lack of customization, like selecting Unicode version (e.g. when connecting to remote systems with older glibc), or changing width of ambiguous characters, but there is no such feature right now. I've already modified Konsole to use UTF32 and glibc's wcwidth(), I just have to clean it up a bit before creating review request.
Git commit e74cf6c36642247f3f79194da373d01a00645d36 by Kurt Hindenburg, on behalf of Mariusz Glebocki. Committed on 03/10/2018 at 15:11. Pushed by hindenburg into branch 'master'. Use new character width code based on Unicode 11 Summary: Adds a code for getting character width togeter with LUTs generated using uni2characterwidth from Unicode 11 lists. Skin tone, flags, gender, and other emoji with and modifer are not joined (you will see e.g. a skin tone square + generic yellow emoji). I think joining them would cause problems in most editors, command line prompts, and other programs which use character width data, as the characters would behave as combining or emoji depending on context (like ligatures). Examples: * light thumb up: đđģ * dark thumb up: đđŋ * Polish flag: đĩđą This behavior is allowed: * https://unicode.org/reports/tr51/#Emoji_Modifiers_Display * https://unicode.org/reports/tr51/#Emoji_ZWJ_Sequences It is possible to add support for sequences, but those would work only for a string width functions. Some characters which can be presented as emoji are narrow (e.g. âī¸, Šī¸). Those characters are listed without "presentation" mode, which means they should be rendered as text by default (real presentation depends on renderer and/or font). Noto Sans Color Emoji renders them as wide, DejaVu Sans as narrow. Vim, bash and zsh treat them as narrow, so I made them narrow. https://unicode.org/reports/tr51/#Presentation_Style Related: bug 396435, bug 392171, bug 339439 FIXED-IN: 18.12 Depends on D15757 Test Plan: * Look at emoji_test.txt - emojis should look "normal" (two characters width). * Look at GLASS.txt - characters width should look correct. * CharacterWidthTest should pass. * perl -XCSDL -e 'print map{chr($_), " "} 1..0xffff' Reviewers: #konsole, #vdg, hindenburg Reviewed By: #konsole, hindenburg Subscribers: hindenburg, broulik, ngraham, konsole-devel Tags: #konsole Differential Revision: https://phabricator.kde.org/D15758 D +0 -64 COPYING.Unicode M +1 -1 src/CMakeLists.txt M +2 -2 src/Character.h A +159 -0 src/CharacterWidth.cpp [License: GENERATED FILE] * A +8 -0 src/CharacterWidth.h [License: UNKNOWN] * A +102 -0 src/CharacterWidth.src.cpp [License: GPL (v2+)] M +1 -1 src/Filter.cpp M +1 -1 src/TerminalCharacterDecoder.cpp M +1 -1 src/TerminalDisplay.cpp M +6 -2 src/autotests/CharacterWidthTest.cpp D +0 -238 src/konsole_wcwidth.cpp D +0 -16 src/konsole_wcwidth.h A +3 -0 tools/uni2characterwidth/overrides.txt The files marked with a * at the end have a non valid license. Please read: https://community.kde.org/Policies/Licensing_Policy and use the headers which are listed at that page. https://commits.kde.org/konsole/e74cf6c36642247f3f79194da373d01a00645d36
I still have this issue on KDE Neon 5.15.0 with Konsole 18.12.2
@Vanush: can you provide some example how to reproduce your problem?
install zsh and awesome-fontconfig here is link to issue I created, I got a response that it will be solved in 19.04 https://bugs.kde.org/show_bug.cgi?id=404525
*** This bug has been marked as a duplicate of bug 401298 ***