Bug 379535 - Combining 4-byte utf8 characters with color-changing ansi escapes gives extra spaces on output
Summary: Combining 4-byte utf8 characters with color-changing ansi escapes gives extra...
Status: RESOLVED FIXED
Alias: None
Product: konsole
Classification: Applications
Component: font (show other bugs)
Version: 16.08.2
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords: triaged
Depends on:
Blocks:
 
Reported: 2017-05-05 02:02 UTC by Tom Littauer
Modified: 2021-05-25 11:52 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 21.08


Attachments
Open Konsole, cat this 87-byte text file, look for extraneous spaces (87 bytes, text/plain)
2017-05-05 02:02 UTC, Tom Littauer
Details
Same as previous but with normal ASCII characters (doubled) vs Card (181 bytes, text/plain)
2017-05-07 17:06 UTC, Tom Littauer
Details
Replaced 2nd line with double-width equivalent (184 bytes, text/plain)
2017-05-07 21:37 UTC, Tom Littauer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Littauer 2017-05-05 02:02:43 UTC
Created attachment 105356 [details]
Open Konsole, cat this 87-byte text file, look for extraneous spaces

The attached text file, when sent by cat to the konsole, should produce:

1) W in whatever the normal color is
2) X in bold red
3) 16 copies of the utf8 4-byte glyph for the Ace of Clubs in bold red
4) Y in bold blue and
5) Z in the normal color

Instead, it produces all the above plus a number of unexpected spaces between 3) and 4)

This behavior happens with every 4-byte unicode glyph my system supports. Every 2- or 3-byte glyph I've tried works just fine.

The amount of extra space is proportional to the number of 4-byte glyphs.

I'm running stock OpenSuSE Leap 42.2 and have done no mucking about with fonts.

I'll be happy to provide additional information but you may have to tell me how to acquire it.

PS: The same behavior happens on earlier versions (Konsole 2.14.2) on KDE 4.14.9
Comment 1 Christoph Feck 2017-05-05 22:47:29 UTC
Konsole does not have support for non-BMP characters, because every cell is only stored as a QChar (which is 16 bits).
Comment 2 Egmont Koblinger 2017-05-05 23:32:01 UTC
Just wondering, how come then that the Ace of Clubs symbols gets shown at all? :)

Combining accents are supported too, so maybe for non-BMP you allocate two QChars per cell using UTF-16 encoding?

Notice that highlighting with the mouse is a pretty interesting experience. As you drag the mouse slowly from the left to the right, at every second step you get to see two replacement symbols (one being highlighted and one not). This might be related to the first QChar of a UTF-16 being selected only. And in each step the remaining (not yet highlighted) playing cards are slowly pushed to the right to fill up the gap.
Comment 3 Tom Littauer 2017-05-06 01:27:18 UTC
Thanks, Egmont.

I have no insight into internals at all, but was also struck by the fact that the playing card showed up. It seemed to me that if it shows up it must be OK to use.

If it were not supported I would have expected the same empty box you get when you use a glyph you don't have installed.

By the way, if using an uninstalled 4-byte glyph you get an empty box that exhibits the same odd space behavior.

As for no 4-byte glyphs, that will make the users of Emoji unhappy, as many of them are in 4-byte territory.

Thank you both for taking the time to look into the situation.
Comment 4 Christoph Feck 2017-05-06 06:19:15 UTC
Egmont, the cells are not rendered individually (causing bug 361547), and when Konsole converts the local 8 bit encoding to QChar, it actually places two surrogate halfs into two cells...
Comment 5 Christoph Feck 2017-05-06 06:26:17 UTC
We somewhat mitigate the problem by assuming that all non-BMP characters are wide now, see commit https://cgit.kde.org/konsole.git/commit/?id=1c13e4841d76e96b7a3da12bc629202e9fd71c5d

Quite ugly :/
Comment 6 Egmont Koblinger 2017-05-06 09:08:11 UTC
I have no insight at all either :)

Christoph, thanks for your explanation, it makes perfect sense now.

Also, it seems to me that continuous runs of identical colors are rendered in a single step, not caring about a possible drift in position. Then at the next color change (no matter if it's by an ANSI sequence or by mouse highlight) rendering continues from the desired position.
Comment 7 Tom Littauer 2017-05-07 17:06:52 UTC
Created attachment 105380 [details]
Same as previous but with normal ASCII characters (doubled) vs Card

Same as previous test case but with the failing line duplicated with the Unicode code point for the Ace of Clubs replaced with plain ascii (1st line single characters, 2nd line doubled).
Comment 8 Tom Littauer 2017-05-07 17:18:50 UTC
It seems the double-width assumption isn't being properly rendered. The newly attached test case tries to show this by replacing the Ace of Clubs code point by (doubled) ascii characters.

The line width ends up correct but each Ace of Clubs isn't padded to double width. Instead the padding is saved up and rendered later, as Egmont points out.

I note with interest that Character.h/cpp and ExtendedCharTable.h/cpp in the source tree for 16.08.02 suggest using a hash table to allow the (infrequent) SMP characters to be captured correctly. It may be, however, that this is not yet used in rendering.

Just in case it matters, the Font being used on my system is Hack 9.0 point
Comment 9 Tom Littauer 2017-05-07 21:37:59 UTC
Created attachment 105382 [details]
Replaced 2nd line with double-width equivalent

More data:

It seems that the "double width" mode is line-at-a-time only and the escape sequence to change to it (\033 # 6) has no effect unless it's at BOL (at least in this version of Konsole). This may interfere with the patch cited previously.

Also, even if we make no attempt to include odd Unicode values, changing colors in the middle of a pure double-width line has odd effects.

The attached test data file (testfile2.txt) makes the second line double width and attempts to change colors in the middle of the line.

Cat the file in Konsole and the 2nd line has different colored characters in wrong (overlapping) positions.
Comment 10 Kurt Hindenburg 2017-06-07 13:43:33 UTC
This does appear to work correctly according to 1-5 in bug report.  I'm running master - 17.04.x versions work as well.

Do you have the ability to upgrade and test?
Comment 11 Tom Littauer 2017-06-07 20:44:05 UTC
Kurt,

I'll be happy to try; no point in reporting bugs if you're not willing to help test.

Can you please point me to a how-to? I have compilers etc installed and spare resources and a long history of *x use but don't often do this.

FYI, I'm running OpenSuSE 42.2 (bug was reported on native install) but also have Virtualbox so can load up whatever is needed.

Thanks,

Tom
Comment 12 Kurt Hindenburg 2017-06-08 00:33:42 UTC
I haven't used openSUSE in years - it looks like you'd need to upgrade (or use a VM) to one of these to test Konsole 17.04.x

https://software.opensuse.org/package/konsole5

openSUSE Tumbleweed
home:wolfi323:branch...17.04.132 Bit32 Bit64 Bit64 BitSourceSource1 Click Install
openSUSE Leap 42.2
home:wolfi323:branch...17.04.164 BitSource1 Click InstallExpand all
Show more packages for unsupported distributions
Comment 13 Tom Littauer 2017-06-09 17:23:57 UTC
Kurt,

Thank you, that was enough of a hint.

I tried with the newly announced OpenSuSE Leap 42.3 release candidate whose konsole reports itself at 17.04.1

testfile.txt works as expected, as does testfile1.txt.

testfile2.txt continues to show anomalies, however, in the area of wide text.

The second line in that file continues to have overlapping characters as described in comment 9.

Also:

Scrolling the text can sometimes result in text that was previously rendered as wide now being rendered as normal, and also makes text previously normal now show up as wide. Note that the direction of the scroll in significant; scrolling up is not the same as scrolling down. Variations on this behavior show up in konsole 16.08.2 as well.

Do you need a different bug report for the wide-character behavior?

Do you need me to load Tumbleweed and then load your branch? Is there a branch for Leap 42.3?

Thanks,

Tom
Comment 14 Kurt Hindenburg 2017-06-30 02:26:39 UTC
A new report might be a good idea to simplify the issue.

The output in xterm shows the YZ at the end so something is definitely wrong in Konsole.
Comment 15 Jaime Torres 2018-10-19 16:53:48 UTC
In konsole 18.11.70, with KDE Frameworks 5.51.0 and Qt 5.11.2 (compiled with 5.11.2), the file with double-width equivalent doesn't produce the expected output, and some normal characters in the konsole history become randomly double-with when scrolling.
Comment 16 ninjalj 2021-05-01 16:01:51 UTC
The remaining problems with double width lines are fixed by Merge Request https://invent.kde.org/utilities/konsole/-/merge_requests/362 , in particular the following commits:

 - 398a6657 Fix positioning of double width/double height
 - 57ce654d Support scrolling double-height/width lines