Bug 464823

Summary: Soft hyphens are incorrectly displayed as normal spaces
Product: [Applications] konsole Reporter: Karl Ove Hufthammer <karl>
Component: fontAssignee: Konsole Developer <konsole-devel>
Status: REPORTED ---    
Severity: normal CC: matan
Priority: NOR    
Version: master   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Karl Ove Hufthammer 2023-01-25 20:00:37 UTC
SUMMARY
When text contains soft hyphens, the soft hyphen characters (U+00AD) are now displayed as normal spaces. They should instead be invisible, and used to be so, so this is a regression. I did a ‘git bisect’, and it looks like the bug was introduced in this commit:

commit 76f879cd70fb494ab2334d2660b34679546f3d9d
Author: Matan Ziv-Av <matan@svgalib.org>
Date:   Sat Aug 6 19:15:42 2022 +0300

    Draw characters in exact positions
    […]


STEPS TO REPRODUCE
1. Run the following command in Konsole:
   echo -e 'super\u00adcali\u00adfragi\u00adlistic\u00adexpi\u00adali\u00addocious'


OBSERVED RESULT
The text is displayed as:
super ­cali ­fragi ­listic ­expi ­ali ­docious


EXPECTED RESULT
The text should be displayed as:
supercalifragilisticexpialidocious


SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20230123
KDE Plasma Version: 5.26.5
KDE Frameworks Version: 5.102.0
Qt Version: 5.15.8
Kernel Version: 6.1.7-1-default (64-bit)
Graphics Platform: X11
Processors: 4 × Intel® Core™ i5-2500 CPU @ 3.30GHz
Memory: 15.6 GiB of RAM
Graphics Processor: NVIDIA GeForce GTX 1060 3GB/PCIe/SSE2
Manufacturer: MSI
Product Name: MS-7673
System Version: 1.0

ADDITIONAL INFORMATION
If the word comes at the end of a line and causes the word to be split over two lines *at the soft hyphen*, the soft hyphen should really be displayed as a *normal* hyphen instead of being invisible. This is the way soft hyphens normally work, e.g., in Kate. But I guess this is a less important feature for a terminal application, which usually doesn’t do any real word wrapping. Just thought I would mention it.

And regarding the use of soft hyphens: I’m a translator of KDE applications, and we use soft hyphens a lot in our translations, also for command-line applications. This bug causes the output (e.g., of ’--help’) to look incorrect (long words are being split).
Comment 1 Matan Ziv-Av 2023-01-27 11:56:03 UTC
This is an unintentional change of behaviour, but I am not sure it is a bug.

Do you edit text with a soft hyphen in konsole, or only view files? If you copy this text to the command line in konsole 22.08, you will see that this confuses bash and the cursor is displayed in the wrong place.

See some discussion here: https://www.mail-archive.com/tech@openbsd.org/msg63271.html (and many other places).
Comment 2 Karl Ove Hufthammer 2023-01-31 19:38:37 UTC
(In reply to Matan Ziv-Av from comment #1)
> This is an unintentional change of behaviour, but I am not sure it is a bug.

The character is shown as space. The Unicode standard specifies that is should be invisible. I think that makes it a bug.

Here’s the relevant section in the latest version of the Unicode standard (section 5.21) (earlier versions have similar text):
https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf

    U+00AD: soft hyphen has a null default appearance in the middle of a
    line: the appearance of “ther[shy]apist” is simply “therapist”—no visible glyph. In
    line break processing, it indicates a possible intraword break. At any intraword
    break that is used for a line break—whether resulting from this character or by
    some automatic process—a hyphen glyph (perhaps with spelling changes) or
    some other indication can be shown, depending on language and context.

> Do you edit text with a soft hyphen in konsole, or only view files?

Well, I *sometimes* edit text with soft hyphens in Konsole (or Yakuake, to be precise). But *mostly* I use Kate or other text editors (or other software) when editing text. I frequently *view* files (or application output) with soft hyphens in Konsole/Yakuake.

> If you copy this text to the command line in konsole 22.08, you will see that this
> confuses bash and the cursor is displayed in the wrong place.

And that should be considered a bug. To clear up, the behaviour I would *expect*, is the same behaviour as Kate (which also uses a monospace font by default, and thus should be comparable) has. That is:

* The character should be invisible.
* When moving the cursor using the arrow keys, the soft hyphen should function as if it were a normal (but zero-width) character. For example, if the cursor is positioned right after the ‘e’ in ther[shy]apist, you need to press the right arrow key three times (not two) to move it to right after the ‘a’. (The first moves it to the right of the r, the second to the right of the soft hyphen, and the third to the right of the a.)
* Regarding the display of the cursor: In Kate when the cursor is an ‘I beam’, for the example above, the cursor moves to the right for the first key press, doesn’t move horizontally for the next key press (which really moves it from ‘to the left of the soft hyphen’ to ‘to the right of the soft hyphen’), and moves to the right for the third key press. If the cursor is a block (insert mode), the current/selected character is shown with a blinking block behind. (So if the soft hyphen is selected, the cursor is invisible. Perhaps a tiny bit confusing, but it is consistent, as we‘re ‘highlighting’ an invisible character.

(BTW, I am familiar with the arguments in Jukka Korpela‘s article 1997 article on the soft hyphen, and even read it around the time it was originally written. I don’t think it’s at all relevant now that the entire world has moved to Unicode and uses Unicode semantics for all characters.)
Comment 3 Bug Janitor Service 2023-02-11 11:34:28 UTC
A possibly relevant merge request was started @ https://invent.kde.org/utilities/konsole/-/merge_requests/812
Comment 4 Kurt Hindenburg 2023-02-24 01:03:30 UTC
Git commit feb44c226fc6ca8e57f37a189442474e63383688 by Kurt Hindenburg, on behalf of Matan Ziv-Av.
Committed on 24/02/2023 at 00:46.
Pushed by hindenburg into branch 'master'.

Make behaviour of characters with problematic width configurable

Add a profile option to follow Unicode standard for the display width of
characters, where this width differs from glibc's wcwidth.

Currently the only character affected by this is soft hyphen (unicode 0x00ad).

Konsole generally follows wcwidth() function when determining the display
width of characters, since this is behaviour expected by libreadline, and
doing otherwise corrupts lines containing problematic characters. When such
characters are used more for display, then on the command line, following
the Unicode standard may be prefferable.

The default for this option is disabled - that is follow wcwidth().

M  +6    -0    doc/manual/index.docbook
M  +8    -2    src/Screen.cpp
M  +4    -0    src/Screen.h
M  +1    -0    src/profile/Profile.cpp
M  +4    -0    src/profile/Profile.h
M  +3    -1    src/session/SessionManager.cpp
M  +1    -0    src/terminalDisplay/TerminalDisplay.cpp
M  +16   -0    src/widgets/EditProfileAppearancePage.ui
M  +8    -0    src/widgets/EditProfileDialog.cpp
M  +1    -0    src/widgets/EditProfileDialog.h

https://invent.kde.org/utilities/konsole/commit/feb44c226fc6ca8e57f37a189442474e63383688