BACKGROUND: The RLM character is a non-printing character that has RTL (Right-to-Left) directionality: https://www.fileformat.info/info/unicode/char/200f/index.htm ׁHere is the Hebrew word for peace, with a period AFTER the word. The period SHOULD be on the left side of the word, but because bugs.kde.org is a LTR (Left-to-Right) website it will erroneously appear on the right of the word: שלום. To resolve that, one places an RLM character AFTER the period: שלום. You can't see that RLM character after the period, but because it's there the period is properly shown to the left of the word. STEPS TO REPRODUCE 1. Print text to Konsole with a trailing RLM character 2. 3. OBSERVED RESULT RLM character is NOT displayed. It is a non-printing character so it cannot be seen, but it's absence is noted by the period being on the right of the word. EXPECTED RESULT RLM character should displayed. It's presence would be noted by the period being on the left of the word. SOFTWARE/OS VERSIONS KDE Frameworks 5.68.0 Qt 5.12.8 (built against 5.12.8) ADDITIONAL INFORMATION Here we can see that the RLM character at the end is not affecting the display of the text. The period should be on the left. Echo is echoing two Hebrew characters, then a period, then the letter e. Then sed is replacing the e with the RLM: $ echo "אב.e" | sed "s;e;$(echo -ne '\u200f');" אב. We can verify that the RLM is there with hd: $ echo "אב.e" | sed "s;e;$(echo -ne '\u200f');" | hd 00000000 d7 90 d7 91 2e e2 80 8f 0a |.........| 00000009 The "e2 80 8f" bytes are the RLM, see the page linked above, which contains this text: UTF-8 (hex): 0xE2 0x80 0x8F (e2808f)
Current konsole strips RLM, among other General_Category=Other_Format (Cf) characters. There is a pending merge request with a commit that changes this: https://invent.kde.org/utilities/konsole/-/merge_requests/567/diffs?commit_id=24216793f573192934f0d9e9d99ac312c5693cb6
Great, thanks. I wonder what problem that displayCharacter() method is intended to solve.
Since you asked: displayCharacter() assigns characters to character cells, and originally didn't support characters with no width (https://bugs.kde.org/show_bug.cgi?id=96536). Then, support was added for diacritics (Mark_NonSpacing) characters, by allowing a character cell to, instead of containing a character, to point to a sequence of characters (https://invent.kde.org/utilities/konsole/-/commit/c335324f31e946d4e3a0c63d1fbed8c114aea987). Later, support was added for Hangul medial and terminal Jamo, which have Letter_Other Unicode General_category (https://invent.kde.org/utilities/konsole/-/commit/437440978bca1bd84e70ee61ba7974f63fe0630a). The referenced commit in the pending merge request further adds support for zero-width Other_Format controls.
Thanks. I honestly think that all characters should be displayed. Every Unicode character and code point exists because somebody, somewhere, needs it.