Bug 425419 - Right-To-Left Text Breaks Flag Emojis
Summary: Right-To-Left Text Breaks Flag Emojis
Status: RESOLVED UPSTREAM
Alias: None
Product: kate
Classification: Applications
Component: general (other bugs)
Version First Reported In: unspecified
Platform: Kubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords: rtl
Depends on:
Blocks:
 
Reported: 2020-08-16 12:05 UTC by I3rav3
Modified: 2022-10-02 20:51 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Bug demonstration (694.76 KB, image/png)
2020-08-16 12:05 UTC, I3rav3
Details
Flag emoji working in QLineEdit Widget (35.22 KB, image/png)
2020-09-03 15:50 UTC, I3rav3
Details
Flag emoji not working in QLineEdit widget (34.45 KB, image/png)
2020-09-03 15:51 UTC, I3rav3
Details

Note You need to log in before you can comment on or make changes to this bug.
Description I3rav3 2020-08-16 12:05:43 UTC
Created attachment 130909 [details]
Bug demonstration

SUMMARY
Whenever a flag emoji is used with a text that is inferred to be right to left, it breaks, showing another flag or a flag with a question mark. This problem is observed in both editable text (as in a text editor for example) and non-editable text (as in the title of a Firefox window for example).

STEPS TO REPRODUCE
1. Choose any flag emoji (this one for example 🇯🇵).
2. Open any application where you can enter text (Kate for example) and paste the flag emoji.
3. Enter text using a right-to-left language (Arabic or Hebrew for example) either before or after the flag emoji.

OBSERVED RESULT
The flag emoji breaks, showing a flag with a question mark in this case.

EXPECTED RESULT
The flag emoji shouldn't change just because the text around it was inferred to be right to left.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Kubuntu 20.04
KDE Plasma Version: 5.18.5
KDE Frameworks Version: 5.68.0
Qt Version: 5.12.8

ADDITIONAL INFORMATION
1- Note that this problem only occurs if the text with the flag emoji is inferred to be right to left. This inference depends on the first character typed other than the emoji. If this character is right to left, the text is inferred to be right to left, and the emoji breaks. If the first character typed other than the emoji is left to right, the text is inferred to be left to right, and the emoji won't break even if subsequently right-to-left characters were used.

2- This problem is peculiar to flag emojis as far as I can tell. All other emojis that I've encountered exhibit no problem with right-to-left text.

3- This problem is universal to KDE as far as I can tell, but is not universal to all of the system (this problem isn't present in Firefox for example).
Comment 1 Christoph Feck 2020-09-03 12:01:17 UTC
> This problem is universal to KDE

Even in simple QLineEdit widgets? Then it's a Qt bug.

Reassigning to Kate developer because the Kate text view may have it's own layouting algorithms.
Comment 2 I3rav3 2020-09-03 15:48:43 UTC
(In reply to Christoph Feck from comment #1)
> > This problem is universal to KDE
> 
> Even in simple QLineEdit widgets? Then it's a Qt bug.
> 
> Reassigning to Kate developer because the Kate text view may have it's own
> layouting algorithms.

Yes. For example, the same behavior is observed in the search box of the application launcher (see the new screenshots attached).

I figured out the problem at the conceptual level, but I honestly lack the requisite knowledge to point at the piece of code that causes it. Flag emojis are not a single unicode character like other emojis. For example, Egypt's flag emoji is made up of two unicode characters: 🇪 and 🇬. When these two characters are typed in sequence without a space, the font that handles emojis interprets them as one unit and shows the corresponding flag. So 🇪🇬 is in fact 🇪 followed by a 🇬 without a space.

The problem is with the way KDE (or Qt as you pointed out) interprets LTR text when used with RTL text. When LTR text is used with RTL text, LTR text should still be read from left to right. This is the way it's interpreted in every piece of software I've ever used (firefox for example), and the way it's actually read in real life (for Arabic at least). For example, if I encounter the sentence "cat مرحباً", I will read it as "hello cat", not as "hello tac". Qt, however, will interpret it as "hello tac". So, for example, if Qt tries to read "🇪🇬 مرحباً", it will interpret the emoji flag characters RTL, and will read 🇬 then 🇪, prompting the font to show Georgia's flag (🇬🇪) instead of Egypt's.

Egypt is lucky to have another flag substituted for its flag. Japan, on the other hand, will have its flag substituted by a question mark flag (depending on the font), because there's no country with code PJ (JP: h, PJ: 🇵🇯).

This is also way single character emojis aren't affected by this problem. A single character is read the same way LTR or RTL.
Comment 3 I3rav3 2020-09-03 15:50:42 UTC
Created attachment 131395 [details]
Flag emoji working in QLineEdit Widget
Comment 4 I3rav3 2020-09-03 15:51:23 UTC
Created attachment 131396 [details]
Flag emoji not working in QLineEdit widget
Comment 5 I3rav3 2020-09-03 16:08:08 UTC
(In reply to I3rav3 from comment #2)
> (In reply to Christoph Feck from comment #1)
> (JP: 🇯🇵, PJ:🇵🇯).
> This is also WHY single character emojis aren't affected by this problem.

Sorry. Correcting typos.
Comment 6 Justin Zobel 2020-11-13 06:29:26 UTC
I have tested this by copying the RTL text in Comment 2 and trying to paste the Australian flag in the document. An entirely different flag appeared.
Comment 7 I3rav3 2020-11-13 09:06:37 UTC
(In reply to Justin Zobel from comment #6)
> I have tested this by copying the RTL text in Comment 2 and trying to paste
> the Australian flag in the document. An entirely different flag appeared.

That must have been Ukraine's flag (AU when read RTL is UA, Ukraine's flag code.)

I have since moved to gnome (not because of this problem, performance issues and better support), and have faced the exact same problem. The problem is definitely due to something upstream, and any helpful advice regarding where to report it would be really appreciated.
Comment 8 Yaron Shahrabani 2020-11-18 08:41:46 UTC
I've presented this issue to the Hebrew Linux community:
https://www.facebook.com/groups/linux.il/permalink/3459130597506838/

We came to the conclusion that gedit, Kate and mousepad in different versions are all affected by this bug.

The test case is:
ישראל 🇮🇱
ליכטנשטיין 🇱🇮
The first one reads Israel and the Israeli flag afterwards
The second one reads Liechtenstein with the corresponding flag

There has been some suspicions regarding Pango but I couldn't verify that.

Pretty funny case.
Comment 9 I3rav3 2020-11-18 12:16:13 UTC
(In reply to Yaron Shahrabani from comment #8)
> I've presented this issue to the Hebrew Linux community:
> https://www.facebook.com/groups/linux.il/permalink/3459130597506838/
> 
> We came to the conclusion that gedit, Kate and mousepad in different
> versions are all affected by this bug.
> 
> The test case is:
> ישראל 🇮🇱
> ליכטנשטיין 🇱🇮
> The first one reads Israel and the Israeli flag afterwards
> The second one reads Liechtenstein with the corresponding flag
> 
> There has been some suspicions regarding Pango but I couldn't verify that.
> 
> Pretty funny case.

What's even funnier is that when I visited your Facebook group, the first post I saw was someone apologizing for writing in English because of Facebook's atrocious RTL support. So yeah, I wouldn't hold this problem against whatever open-source library is causing it, when much larger companies with much deeper pockets still can't figure RTL out. It's been ages since I've used Twitter, but I remember its RTL support to be exceptionally good. It even manages to not mangle RTL text when the interface itself is LTR, something that Google doesn't even seem to be aware it's doing.

I would still like this problem to be fixed, and would appreciate any pointers regarding reporting somewhere where someone can actually figure it.

It also seems worth repeating that this problem isn't universal to Linux. It doesn't occur for example in Firefox, Thunderbird, PyCharm or Sublime Text from what I've been able to test.
Comment 10 I3rav3 2020-11-18 12:19:05 UTC
(In reply to I3rav3 from comment #9)
> pointers regarding reporting [it] somewhere where someone can actually figure it [out].
Comment 11 Waqar Ahmed 2022-10-02 20:51:27 UTC
This needs to be reported to Qt, nothing can be done about it in Kate (unless we start doing text rendering from scratch)