Bug 503279 - Copying ﻷ doesn't follow its Unicode (Arabic)
Summary: Copying ﻷ doesn't follow its Unicode (Arabic)
Status: RESOLVED NOT A BUG
Alias: None
Product: kate
Classification: Applications
Component: kwrite (other bugs)
Version First Reported In: 24.12.3
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-24 11:04 UTC by easy_lad
Modified: 2025-04-24 13:12 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description easy_lad 2025-04-24 11:04:39 UTC
SUMMARY

The "ﻷ" have its own Unicode (U+FEF7). If you copied it and pasted it somewhere else out of KWrite, it converts its Unicode to "ل" (U+0644) and "أ" (U+0623), which if combined provides the same character but with each latter having its own Unicode. I think it might be issue with Kate too, but I didn't test it out. It is an issue when you are trying to search for that string and it shows 0 results on other software because of that behavoir.

STEPS TO REPRODUCE
1. use shortcut shift + g in Arabic keyboard layout to type "ﻷ"
2. copy it and paste it somewhere out of KWrite

OBSERVED RESULT

it converts its Unicode to "ل" (U+0644) and "أ" (U+0623), which if combined provides the same character but with each latter having its own Unicode.

EXPECTED RESULT

Preserve its own Unicode (U+FEF7).

SOFTWARE/OS VERSIONS

Linux/KDE Plasma: Fedora Linux 41
KDE Plasma Version: 6.3.3
KDE Frameworks Version: 6.12.0
Qt Version: 6.8.2
Comment 1 easy_lad 2025-04-24 11:07:21 UTC
I used https://www.babelstone.co.uk/Unicode/whatisit.html which was helpful in debugging this issue.
Comment 2 Waqar Ahmed 2025-04-24 11:42:04 UTC
Not an issue with Kate. I routinely use Kate for RTL text far more complex than لا and haven't ever encountered such an issue. And just to test, I can paste the following from Kate to this comment:

ﻷ

looks fine to me.
Comment 3 easy_lad 2025-04-24 12:02:27 UTC
(In reply to Waqar Ahmed from comment #2)
> Not an issue with Kate. I routinely use Kate for RTL text far more complex
> than لا and haven't ever encountered such an issue. And just to test, I can
> paste the following from Kate to this comment:
> 
> ﻷ
> 
> looks fine to me.

You tried the reproduction steps I provided and checked it's Unicode? Because what you replied with is irrelevant to the issue. In case it wasn't clear:

Open KWrite new file, use shortcut shift + g in Arabic keyboard layout to type "ﻷ" copy it, then paste it somewhere out of KWrite in some other app, you can test it here https://www.babelstone.co.uk/Unicode/whatisit.html. The output will be:

U+0644 : ARABIC LETTER LAM
U+0623 : ARABIC LETTER ALEF WITH HAMZA ABOVE

Instead of:

U+FEF7 : ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
Comment 4 Waqar Ahmed 2025-04-24 13:12:33 UTC
That is not controlled by Kate. We just show the data that we get. I tried many other apps and they have the same issue, except for the browser which somehow handles it differently. Probably your system has a configuration somewhere which tells it to decompose U+FEF7 into lam and alef which you can change. grepping my /usr directory I found one entry, which is probably related to this.

/usr/share/X11/locale/en_US.UTF-8/Compose
4467:<UFEF7>    : "لأ"  # ARABIC LETTER LAM plus ARABIC LETTER ALEF WITH HAMZA ABOVE