Bug 401094

Summary: Malayalam rendering regression in Konsole 18.08
Product: [Applications] konsole Reporter: Rajeesh K V <rajeeshknambiar>
Component: fontAssignee: Konsole Developer <konsole-devel>
Status: RESOLVED FIXED    
Severity: major CC: mail, martin.sandsmark, mglb, neosrix, rajeeshknambiar
Priority: NOR    
Version: 18.08.1   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: Konsole 17.12 good shaping and rendering
Konsole 18.08 bad shaping and rendering
POC patch
Konsole 19.12 with Malayalam PoC patch
render preview (2019-12-22)
Konsole 19.12 with Malayalam PoC patch v2
Konsole 19.12 with non-latin scripts patch v3

Description Rajeesh K V 2018-11-16 09:14:19 UTC
Created attachment 116335 [details]
Konsole 17.12 good shaping and rendering

SUMMARY
Konsole used to have very good support for rendering complex script languages like Malayalam (which has conjunct glyphs and doesn't usually have a monospace font). In Konsole 18.08, rendering is regressed. See attached screenshots from Konsole 17.12 (good rendering) and 18.08 (bad rendering) respectively.


STEPS TO REPRODUCE
1. Open konsole
2. $echo "എളമക്കര സ്കൂളിൽ"
3. 

OBSERVED RESULT
Conjuncts (which form a single glyph from a combination of more than 2 glyphs) are broken. Also, glyphs are chopped off (e.g. the first character എ).

EXPECTED RESULT
Glyphs are not chopped off/truncated and conjunct glyphs are formed; similar to screenshot of konsole-17.12

SOFTWARE/OS VERSIONS
Windows: 
MacOS: 
Linux/KDE Plasma: Fedora 29
(available in About System)
KDE Plasma Version: Plasma 5.14.3
KDE Frameworks Version: Frameworks 5.52
Qt Version: 5.11.1

ADDITIONAL INFORMATION
Comment 1 Rajeesh K V 2018-11-16 09:15:06 UTC
Created attachment 116336 [details]
Konsole 18.08 bad shaping and rendering
Comment 2 Martin Sandsmark 2018-12-02 04:53:35 UTC
it's a long time since I looked into this stuff, but is it similar to the issue  I fixed here?: https://cgit.kde.org/konsole.git/commit/?id=437440978bca1bd84e70ee61ba7974f63fe0630a
Comment 4 Rajeesh K V 2018-12-02 06:01:12 UTC
(In reply to Martin Sandsmark from comment #3)
> I guess it might be because of
> https://cgit.kde.org/konsole.git/commit/
> ?id=e74cf6c36642247f3f79194da373d01a00645d36 and/or
> https://cgit.kde.org/konsole.git/commit/
> ?id=5128781a824c26dc2746650fea0ae9f95861b9d8

Checking the log and these 2 commits looked related. I'll try to revert these 2 commits (does any other commits/functionality depend on these 2?) and test the build to be sure, and report.
Comment 5 Rajeesh K V 2018-12-02 07:51:37 UTC
I couldn't cleanly revert (or apply) neither patches to konosole-18.08.3 source. Some hunks succeed, some fail (both while applying patch normally or reversing). Any ideas?
Comment 6 Rajeesh K V 2019-05-20 07:43:46 UTC
Bug is still present in 19.04.1

This is a serious regression for many users :-(
Comment 7 Kurt Hindenburg 2019-06-17 01:31:03 UTC
It appears that https://cgit.kde.org/konsole.git/commit/?id=a565bc97337a3bfc3a027f46aa2dec3e9a6f8618 first caused this.

Mariusz, do you have time to look at this ?
Comment 8 Mariusz Glebocki 2019-06-17 14:46:36 UTC
Yes. All characters from scripts with conjuncts should be passed as a single string to text renderer, as it was done with RTL text. I'll implement it.
Comment 9 Rajeesh K V 2019-08-27 10:31:02 UTC
(In reply to Mariusz Glebocki from comment #8)
> Yes. All characters from scripts with conjuncts should be passed as a single
> string to text renderer, as it was done with RTL text. I'll implement it.

Thank you, Mariusz.
Do you have a patch that I can help to test with?
Comment 10 Subin 2019-12-17 18:41:26 UTC
This bug is annoying. Would be great if it can be fixed.
Comment 11 Mariusz Glebocki 2019-12-17 23:50:44 UTC
Created attachment 124559 [details]
POC patch

Sorry for late reply.

I've attached a patch which renders provided example almost as intended. Each word is rendered separately and spacing is wrong, but it can be fixed later. Apply with `patch -p1 < malayalam-rendering-poc.patch` in top level Konsole directory.

However, I'm not sure how to integrate it into Konsole. This is technically a hack. Terminal programs are unable to get the width of a conjunct, and they don't know which characters are part of a conjunct. This leads to text being shifted/glitched during selection, conjuncts being "broken" when a program (e.g. vim) places cursor on part of a conjunct, invalid line wrapping, etc. This behavior can't be enabled by default. Possible solutions:

* Add an option for enabling the feature. I'm against this. It will be broken eventually, because many developers won't test yet another rendering mode. Also: hack.

* Rename "Bi-directional text rendering" option to something like "Support for complex scripts" and enable special rendering for complex scripts like Malayalam when it is enabled. RTL scripts already have the same problems, so users wouldn't be suprised. This solution has the same issues as previous one, but it's already there.


It also would be nice to improve it further, but help from people who actually use complex scripts is needed. Useful input would include:
* Are there some fixed-width fonts for the script?
* Is there another terminal which handles it well?
* Is there some simplified/alternative way to display the script?
* Are there (console) programs which actually use the script? What are real-world use cases?
* Code! :)
Comment 12 Rajeesh K V 2019-12-18 04:32:20 UTC
(In reply to Mariusz Glebocki from comment #11)

> 
> I've attached a patch which renders provided example almost as intended.
> Each word is rendered separately and spacing is wrong, but it can be fixed
> later. Apply with `patch -p1 < malayalam-rendering-poc.patch` in top level
> Konsole directory.

Applied the patch over konsole-19.12.0 source and tested the rendering. Comparing with rendering of 18.08, there is improvement (see attached screenshot), but not fully fixed.

> 
> However, I'm not sure how to integrate it into Konsole. This is technically
> a hack. Terminal programs are unable to get the width of a conjunct, and
> they don't know which characters are part of a conjunct. This leads to text
> being shifted/glitched during selection, conjuncts being "broken" when a
> program (e.g. vim) places cursor on part of a conjunct, invalid line
> wrapping, etc. This behavior can't be enabled by default. Possible solutions:

Indeed, console programs such as vim (even gvim) do have difficulty with cursor positioning since the beginning of epoch. A notable exception is Emacs, which handles (shapes and edits) complex scripts (such as Indic scripts where Malayalam belongs) very well. And Konsole ≤ 17.12 had very good support for shaping & rendering Indic scripts (see attachment in comment #1).

> 
> * Add an option for enabling the feature. I'm against this. It will be
> broken eventually, because many developers won't test yet another rendering
> mode. Also: hack.

Not only Malayalam (the patch checks for Unicode range 0D00–0D7F) but all complex scripts including dozens of Indic scripts, Sinhala, South/East Asian scripts etc. all would need similar support.

> 
> * Rename "Bi-directional text rendering" option to something like "Support
> for complex scripts" and enable special rendering for complex scripts like
> Malayalam when it is enabled. RTL scripts already have the same problems, so
> users wouldn't be suprised. This solution has the same issues as previous
> one, but it's already there.

Agree, this would be a better option.


> 
> It also would be nice to improve it further, but help from people who
> actually use complex scripts is needed. Useful input would include:

Happy to help.

> * Are there some fixed-width fonts for the script?

There are no fixed-width fonts for Indic scripts, to my knowledge.

> * Is there another terminal which handles it well?

Historically, only Konsole ≤ 17.12 handled Indic scripts well. GNOME Terminal and other terminals never properly rendered those. Also, Emacs shapes and renders complex scripts very well, but to my knowledge it uses a different library.

> * Is there some simplified/alternative way to display the script?

Not sure I understand the question, could you elaborate?
If you only want to check the shaping/rendering, the 'hb-view' tool from harfbuzz would be the best option. You can test the shaping of a text using following command:

$ hb-view --font-file=/path/to/font.ttf -o /path/to/test.png "text-to-shape" && xdg-open /tmp/test.png

You may download the font https://smc.org.in/downloads/fonts/meera/Meera-Regular.ttf to test. This font is also available in Fedora/Debian/Ubuntu.


> * Are there (console) programs which actually use the script? What are
> real-world use cases?

Just like Latin scripts, complex scripts are used by millions of people on a daily basis.

> * Code! :)

I did try to go through the code and get me head around those, but the terminal code is very complex. Will try to keep at it and seek your help to make progress.

But, if already experienced konsole developers could compare with Konsole 17.12 and the current code, I hope it could save a lot of time.

Thanks for your efforts!
Comment 13 Rajeesh K V 2019-12-18 04:34:06 UTC
Created attachment 124564 [details]
Konsole 19.12 with Malayalam PoC patch

Rendering of Malayalam text using Konsole 19.12 with the PoC patch from comment#11.
Comment 14 Subin 2019-12-19 15:10:54 UTC
(In reply to Mariusz Glebocki from comment #11)

> * Rename "Bi-directional text rendering" option to something like "Support
> for complex scripts" and enable special rendering for complex scripts like
> Malayalam when it is enabled. RTL scripts already have the same problems, so
> users wouldn't be suprised. This solution has the same issues as previous
> one, but it's already there.

Where is this option located ? Would enabling that now improve the current display of malayalam characters ?

> * Are there (console) programs which actually use the script? What are
> real-world use cases?

My usecase is editing malayalam text files with nano. Also, Konsole is localized in our language, but the localized error messages from console is not displayed right.

I think if Malayalam, other Indian languages & complex script is supported well in Konsole, it would make Konsole unique among the many terminal emulators across different platforms. Hoping Konsole have this feature :)
Comment 15 Mariusz Glebocki 2019-12-22 20:16:21 UTC
Created attachment 124656 [details]
render preview (2019-12-22)

(In reply to Rajeesh K V from comment #12)
> (In reply to Mariusz Glebocki from comment #11)
> 
> > 
> > I've attached a patch which renders provided example almost as intended.
> > Each word is rendered separately and spacing is wrong, but it can be fixed
> > later. Apply with `patch -p1 < malayalam-rendering-poc.patch` in top level
> > Konsole directory.
> 
> Applied the patch over konsole-19.12.0 source and tested the rendering.
> Comparing with rendering of 18.08, there is improvement (see attached
> screenshot), but not fully fixed.

Oops! Right. Working version here: https://invent.kde.org/mglebocki/konsole/tree/wip/complex-scripts-support
There's also a commit which render debug rectangles/tooltips - just skip/remove it if you want to build Konsole for regular use.


> Historically, only Konsole ≤ 17.12 handled Indic scripts well. GNOME
> Terminal and other terminals never properly rendered those. Also, Emacs
> shapes and renders complex scripts very well, but to my knowledge it uses a
> different library.

Terminal version of emacs, or the one with its own window? The latter does not have terminal limitations.


> Not only Malayalam (the patch checks for Unicode range 0D00–0D7F) but all
> complex scripts including dozens of Indic scripts, Sinhala, South/East Asian
> scripts etc. all would need similar support.

That's something to do later. Ideally it would just need adding additional ranges.


> > * Is there some simplified/alternative way to display the script?
> 
> Not sure I understand the question, could you elaborate?

Kind of like 'ae' can be used instead of 'æ' when there are technical limitations.
I didn't find anything like this for Malayalam, so I guess that's not the case here.


> > * Are there (console) programs which actually use the script? What are
> > real-world use cases?
> 
> Just like Latin scripts, complex scripts are used by millions of people on a
> daily basis.

I mean, some console programs which use complex scripts in its interface, preferably something more advanced than just printing line of text. They would be useful as test cases. So far I'm testing mc with file names written in Malayalam.

https://phabricator.kde.org/F7831279


(In reply to Subin from comment #14)
> (In reply to Mariusz Glebocki from comment #11)
> 
> > * Rename "Bi-directional text rendering" option to something like "Support
> > for complex scripts" and enable special rendering for complex scripts like
> > Malayalam when it is enabled. RTL scripts already have the same problems, so
> > users wouldn't be suprised. This solution has the same issues as previous
> > one, but it's already there.
> 
> Where is this option located ? Would enabling that now improve the current
> display of malayalam characters ?

It is not there yet, but I'll add it to Edit Profile → Advanced.


> > * Are there (console) programs which actually use the script? What are
> > real-world use cases?
> 
> My usecase is editing malayalam text files with nano.

This might be hard to do - nano must support it, otherwise it will be moving cursor incorrectly.
Comment 16 Rajeesh K V 2019-12-24 08:05:11 UTC
Created attachment 124682 [details]
Konsole 19.12 with Malayalam PoC patch v2

(In reply to Mariusz Glebocki from comment #15)
> Created attachment 124656 [details]
> render preview (2019-12-22)

The rendering on this one is excellent! There is a major catch, though - the font used is Noto Sans Malayalam which supports only 'reformed' orthography, meaning many conjuncts are not grouped/present in that font. A font supporting 'traditional' orthography is necessary to see the full spectrum of beauty and complexity of the script, such as Rachana/Meera which are available in most distributions. In Fedora 30+, package name is 'smc-{rachana,meera}-fonts'.


> Oops! Right. Working version here:
> https://invent.kde.org/mglebocki/konsole/tree/wip/complex-scripts-support
> There's also a commit which render debug rectangles/tooltips - just
> skip/remove it if you want to build Konsole for regular use.

Applied all 3 top-most patches from that branch on top of 19.12.0 source, compiled and tested, and the rendering of Malayalam is excellent! 👏

In the attached screenshot, you can see that the bottom portion of a glyph with  vertical conjunct is cut-off, which is the only minor issue.


> Terminal version of emacs, or the one with its own window? The latter does
> not have terminal limitations.

Emacs, with 'ansi-term' mode (found in one of the mailing list discussions, I don't use Emacs to independently verify).


> 
> > > * Is there some simplified/alternative way to display the script?
> > 
> > Not sure I understand the question, could you elaborate?
> 
> Kind of like 'ae' can be used instead of 'æ' when there are technical
> limitations.
> I didn't find anything like this for Malayalam, so I guess that's not the
> case here.

Indeed not. Indic languages can form a variety of conjuncts from multiple characters (up to 4) and those are present in the font as a single glyph (ligature). Such glyphs do not have a single Unicode code point as they represent a combination of code points.


> 
> I mean, some console programs which use complex scripts in its interface,
> preferably something more advanced than just printing line of text. They
> would be useful as test cases. So far I'm testing mc with file names written
> in Malayalam.
> 
> https://phabricator.kde.org/F7831279

OK. I'm not aware of a console program using Malayalam in its user interface, will do a search and report. And the test with 'mc' is certainly a good one. I can provide test cases for this (the text and corresponding image of expected rendering). Is there a test case repository where I can contribute these?
Comment 17 Mariusz Glebocki 2020-01-04 14:14:06 UTC
(In reply to Rajeesh K V from comment #16)
> In the attached screenshot, you can see that the bottom portion of a glyph
> with  vertical conjunct is cut-off, which is the only minor issue.

I think "Line spacing" feature should be modified to center text vertically and then it could be used to solve such issues.

> I can provide test cases for this (the text and corresponding
> image of expected rendering). Is there a test case repository where I can
> contribute these?

`tests` directory here: https://invent.kde.org/kde/konsole

I was testing with this: https://github.com/santhoshtr/malayalam-conjuncts/blob/master/conjuncts.txt

Since Konsole does not handle conjuncts rendering itself (it only groups characters into strings and clips the result) I think most basic sample would be enough, e.g. entry in tests/GLASS.utf8. Due to the fact that the rendering result depends heavily on fontconfig configuration and installed fonts, rendering verification is best done by displaying the same text in Konsole and something known to render properly (GUI text editor, web browser).


BTW. I've pushed new commit which groups characters by scripts they belong to. All unicode scripts are covered, not only Malayalam. Known bug: non-latin + latin strings are not grouped (e.g. സ്കൂളിൽ.txt).
Comment 18 Rajeesh K V 2020-01-05 15:13:12 UTC
Created attachment 124909 [details]
Konsole 19.12 with non-latin scripts patch v3

(In reply to Mariusz Glebocki from comment #17)

> Since Konsole does not handle conjuncts rendering itself (it only groups
> characters into strings and clips the result) I think most basic sample
> would be enough, e.g. entry in tests/GLASS.utf8.

I’ll add some Indic language texts in the said file and open a phabricator review.

> 
> BTW. I've pushed new commit which groups characters by scripts they belong
> to. All unicode scripts are covered, not only Malayalam. Known bug:
> non-latin + latin strings are not grouped (e.g. സ്കൂളിൽ.txt).

Tested with Malayalam and Devanagari; all good, barring the grouping of latin+non-latin and vertical conjunct split.
Comment 19 Rajeesh K V 2020-01-11 11:45:53 UTC
(In reply to Mariusz Glebocki from comment #17)
> 
> `tests` directory here: https://invent.kde.org/kde/konsole
> ...
> characters into strings and clips the result) I think most basic sample
> would be enough, e.g. entry in tests/GLASS.utf8. Due to the fact that the

So I checked the GLASS.utf8 file to add test cases and noticed the warning at the top of the file to not edit it directly, but contribute to the Kermit project and resync. URL from the file mentions that it is obsolete and the up-to-date file is at http://kermitproject.org/utf8.html#glass, which already has translations for many Indic languages including Malayalam. How do we 'resync' the texts?
Comment 20 Mariusz Glebocki 2020-01-11 18:59:25 UTC
Edit the file in some editor with full unicode support (like kate), copy-paste new list, update the note about source & syncing, update license note (if changed).
Comment 21 Rajeesh K V 2020-01-12 06:51:11 UTC
Done. Review created: https://phabricator.kde.org/D26599

Meantime, could you land your changes to master and 19.12.2? It is a big improvement over the status quo. Fixing the vertical spacing issue will be great, but it could be in a follow up commit I feel.
Comment 22 Rajeesh K V 2020-02-16 04:51:16 UTC
A polite ping.

Can we get Mariusz’s bug fix patch merged, possibly in 19.12.y release? I think many distros are going to ship 19.12 release (including Fedora 32).
Comment 23 Kurt Hindenburg 2020-02-18 01:15:43 UTC
To clarify, this ping is for the update of GLASS file?  Is there a patch for the actual code changes?
Comment 24 Rajeesh K V 2020-02-18 04:32:38 UTC
(In reply to Kurt Hindenburg from comment #23)
> To clarify, this ping is for the update of GLASS file?  Is there a patch for
> the actual code changes?

The ping is for fixing complex script shaping, see topmost commits by Mariusz here: https://invent.kde.org/mglebocki/konsole/tree/wip/complex-scripts-support

The update to GLASS file is nice to have, not urgent though.
Comment 25 Kurt Hindenburg 2020-02-26 03:04:05 UTC
The next KDE release is 20.04 which the beta released in less than a month.


I can try to sort out Mariusz's repo;  it changes a lot of other things not just Malayalam fixes.  I can't promise it will make it in.  For now, your best bet would be to build from source w/ the fixes.

https://community.kde.org/Schedules/release_service/20.04_Release_Schedule
Comment 26 Rajeesh K V 2020-02-26 11:44:13 UTC
(In reply to Kurt Hindenburg from comment #25)

> I can try to sort out Mariusz's repo;  it changes a lot of other things not
> just Malayalam fixes.

Thanks, that is much appreciated. I think the core change are in commits c143d270 and 35dfc4b4. These fix not just Malayalam, but most complex scripts’ shaping. The debug commits can be omitted, I suppose.

>  I can't promise it will make it in.  For now, your
> best bet would be to build from source w/ the fixes.

Sure, I'm already running the 19.12.y release with these patches applied on top, but general users would be left out. I had a number of people asking in mastodon and elsewhere, whether this fix is released.
Comment 27 Kurt Hindenburg 2020-03-25 01:13:07 UTC
I'm actively working on this - ping me again in a few weeks if I haven't committed it.
Comment 28 Rajeesh K V 2020-03-25 08:58:21 UTC
(In reply to Kurt Hindenburg from comment #27)
> I'm actively working on this - ping me again in a few weeks if I haven't
> committed it.

Thanks, Kurt. I hope you are doing well with the COVID quarantine. I'll make sure to ping in a few weeks time.
Comment 29 Kurt Hindenburg 2020-03-28 15:14:02 UTC
Git commit 94ff722fc0e68987bd743663bc63ef99ff4e0706 by Kurt Hindenburg, on behalf of Mariusz Glebocki.
Committed on 28/03/2020 at 14:41.
Pushed by hindenburg into branch 'master'.

Malayalam support: primitive PoC

https://invent.kde.org/mglebocki/konsole complex-scripts-support

M  +6    -1    src/TerminalDisplay.cpp

https://invent.kde.org/kde/konsole/commit/94ff722fc0e68987bd743663bc63ef99ff4e0706
Comment 30 Kurt Hindenburg 2020-03-28 15:14:02 UTC
Git commit 8ad28a12574cadc7a41e152ec683380d7743c2a8 by Kurt Hindenburg, on behalf of Mariusz Glebocki.
Committed on 28/03/2020 at 14:43.
Pushed by hindenburg into branch 'master'.

Group rendered characters by script

https://invent.kde.org/mglebocki/konsole complex-scripts-support

M  +24   -30   src/TerminalDisplay.cpp

https://invent.kde.org/kde/konsole/commit/8ad28a12574cadc7a41e152ec683380d7743c2a8
Comment 31 Kurt Hindenburg 2020-03-28 15:14:03 UTC
Git commit 86085d4acdc40d4e331c1b987f41f3d0d1afd6d0 by Kurt Hindenburg, on behalf of Mariusz Glebocki.
Committed on 28/03/2020 at 14:42.
Pushed by hindenburg into branch 'master'.

Allow for grouping extended characters

https://invent.kde.org/mglebocki/konsole complex-scripts-support

M  +4    -1    src/TerminalDisplay.cpp

https://invent.kde.org/kde/konsole/commit/86085d4acdc40d4e331c1b987f41f3d0d1afd6d0
Comment 32 Kurt Hindenburg 2020-04-28 13:24:54 UTC
Reopen this if there are still issues
Comment 33 Bug Janitor Service 2021-12-04 22:19:55 UTC
A possibly relevant merge request was started @ https://invent.kde.org/utilities/konsole/-/merge_requests/544
Comment 34 ninjalj 2022-01-04 00:14:58 UTC
*** Bug 424089 has been marked as a duplicate of this bug. ***