Bug 339439 - Konsole treats non-BMP (Basic Multilingual Plane) unicode characters inconsistently
Summary: Konsole treats non-BMP (Basic Multilingual Plane) unicode characters inconsis...
Status: RESOLVED FIXED
Alias: None
Product: konsole
Classification: Applications
Component: font (show other bugs)
Version: unspecified
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords:
: 341242 399558 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-09-26 20:16 UTC by Tavian Barnes
Modified: 2018-10-20 21:57 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In: 18.12


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tavian Barnes 2014-09-26 20:16:38 UTC
On pasting a character like 
Comment 1 Tavian Barnes 2014-09-26 20:18:18 UTC
Wow, I typed up a nice description but I think the unicode characters confused bugzilla and it got cut off.  Let's try again:

On pasting a character like 
Comment 2 Tavian Barnes 2014-09-26 20:20:31 UTC
Okay that's not going to work.  In the following text, anything that says <clef> should be read as being the non-BMP character U+1D11E (http://www.fileformat.info/info/unicode/char/1d11e/index.htm)

On pasting a character like <clef> which is outside of the BMP, Konsole renders it as one character but treats it like two characters.  The cursor is displayed one character right of where it should be, and pressing the left arrow key appears to move into the "middle" of the character, splitting it into two boxes.  A single backspace deletes the character but Konsole seems to think only half of it got deleted.

Steps to reproduce:

1. Open Konsole
2. Type/paste echo clef <clef>
3. Press backspace three times

Actual results:

The line says "echo clef<cursor><box>".  On pressing enter, "cle" is echoed.

Expected results:

The line should say "echo cle".

Additional information:

I believe I have my locale and encodings set up correctly.  LANG=en_CA.UTF-8, and UTF-8 is selected as the default character encoding for this profile.
Comment 3 Christoph Feck 2014-11-25 11:38:27 UTC
*** Bug 341242 has been marked as a duplicate of this bug. ***
Comment 4 Kurt Hindenburg 2018-10-03 15:11:28 UTC
Git commit e74cf6c36642247f3f79194da373d01a00645d36 by Kurt Hindenburg, on behalf of Mariusz Glebocki.
Committed on 03/10/2018 at 15:11.
Pushed by hindenburg into branch 'master'.

Use new character width code based on Unicode 11

Summary:
Adds a code for getting character width togeter with LUTs generated
using uni2characterwidth from Unicode 11 lists.

Skin tone, flags, gender, and other emoji with and modifer are not
joined (you will see e.g. a skin tone square + generic yellow emoji).
I think joining them would cause problems in most editors, command line
prompts, and other programs which use character width data, as the
characters would behave as combining or emoji depending on context (like
ligatures).

Examples:
* light thumb up: 👍đŸģ
* dark thumb up:  👍đŸŋ
* Polish flag:    đŸ‡ĩ🇱

This behavior is allowed:
* https://unicode.org/reports/tr51/#Emoji_Modifiers_Display
* https://unicode.org/reports/tr51/#Emoji_ZWJ_Sequences

It is possible to add support for sequences, but those would work
only for a string width functions.

Some characters which can be presented as emoji are narrow (e.g. ✖ī¸, Šī¸).
Those characters are listed without "presentation" mode, which means
they should be rendered as text by default (real presentation depends on
renderer and/or font). Noto Sans Color Emoji renders them as wide,
DejaVu Sans as narrow. Vim, bash and zsh treat them as narrow, so I made
them narrow.

https://unicode.org/reports/tr51/#Presentation_Style
Related: bug 396435, bug 378124, bug 392171

FIXED-IN: 18.12

Depends on D15757

Test Plan:
* Look at emoji_test.txt - emojis should look "normal" (two characters
width).
* Look at GLASS.txt - characters width should look correct.
* CharacterWidthTest should pass.
* perl -XCSDL -e 'print map{chr($_), " "} 1..0xffff'

Reviewers: #konsole, #vdg, hindenburg

Reviewed By: #konsole, hindenburg

Subscribers: hindenburg, broulik, ngraham, konsole-devel

Tags: #konsole

Differential Revision: https://phabricator.kde.org/D15758

D  +0    -64   COPYING.Unicode
M  +1    -1    src/CMakeLists.txt
M  +2    -2    src/Character.h
A  +159  -0    src/CharacterWidth.cpp     [License: GENERATED FILE]  *
A  +8    -0    src/CharacterWidth.h     [License: UNKNOWN]  *
A  +102  -0    src/CharacterWidth.src.cpp     [License: GPL (v2+)]
M  +1    -1    src/Filter.cpp
M  +1    -1    src/TerminalCharacterDecoder.cpp
M  +1    -1    src/TerminalDisplay.cpp
M  +6    -2    src/autotests/CharacterWidthTest.cpp
D  +0    -238  src/konsole_wcwidth.cpp
D  +0    -16   src/konsole_wcwidth.h
A  +3    -0    tools/uni2characterwidth/overrides.txt

The files marked with a * at the end have a non valid license. Please read: https://community.kde.org/Policies/Licensing_Policy and use the headers which are listed at that page.


https://commits.kde.org/konsole/e74cf6c36642247f3f79194da373d01a00645d36
Comment 5 Nate Graham 2018-10-20 21:57:30 UTC
*** Bug 399558 has been marked as a duplicate of this bug. ***