Bug 205447

Summary: [BiDi/Unicode] Non-BMP characters are incorrectly handled
Product: [Frameworks and Libraries] frameworks-ktexteditor Reporter: Sean Hunt <rideau3>
Component: generalAssignee: KWrite Developers <kwrite-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: christoph, daniel, waqar.17a, zoeacacia
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Patch backspace and delete for surrogate pairs
Test for bug 205477 surrogate pair insert/delete behavior
Testcase for bug 205447 surrogate pair insert delete behavior

Description Sean Hunt 2009-08-28 07:38:40 UTC
Version:           3.3.0 (using KDE 4.3.0)
OS:                Linux
Installed from:    Ubuntu Packages

When a document includes a Unicode character that is not a member of the Basic Multilingual Plane - that is, has a code point above U+FFFF, the character causes weird issues. For instance, the following (if you get square(s), it's U+1D43D):

Comment 1 Sean Hunt 2009-08-28 07:41:00 UTC
It appears that bugs.kde.ord does not handle them well either :/

Basically, things are messed up, just copy-paste one into your document to see. Move the cursor around, and try deleting the character with your keyboard (without selecting it).
Comment 2 Christoph Cullmann 2012-11-01 15:25:30 UTC
*** Bug 308048 has been marked as a duplicate of this bug. ***
Comment 3 Christoph Cullmann 2012-11-01 15:26:13 UTC
Patches to fix this would be really welcome, I guess the backspace/delete key need to look how many QChars must be removed from the line instead of the current stupid "one char at a time" method.
Comment 4 Zoe Clifford 2015-06-08 06:15:59 UTC
Created attachment 93067 [details]
Patch backspace and delete for surrogate pairs
Comment 5 Zoe Clifford 2015-06-08 06:18:06 UTC
I attached a patch that handles the case of surrogate pairs with backspace and delete.

This patch doesn't address the underlying issue; that is it does not address combining characters and it DEFINITELY doesn't address how block selection fails spectacularly in the face of weird unicode.

But the complicated stuff could be beyond me, and I'm a noob, so I wanted to start with a noob patch.
Comment 6 Zoe Clifford 2015-06-09 18:50:29 UTC
Created attachment 93094 [details]
Test for bug 205477 surrogate pair insert/delete behavior
Comment 7 Zoe Clifford 2015-06-09 18:53:06 UTC
Comment on attachment 93094 [details]
Test for bug 205477 surrogate pair insert/delete behavior

Oops! Put the wrong bug number on this.

Sorry people I'm new at this...
Comment 8 Zoe Clifford 2015-06-09 19:00:04 UTC
Created attachment 93095 [details]
Testcase for bug 205447 surrogate pair insert delete behavior
Comment 9 Zoe Clifford 2015-06-14 16:07:52 UTC
I got my patch to fix backspace and deleting surrogate pairs updated and committed with this code review:

https://git.reviewboard.kde.org/r/124073/

A good start!

Once I have time I might look at combining characters too. But that will require a better understanding of unicode and the code-base than I have to not accidentally make things worse. So I'd need to spend some time figuring out how it all works.
Comment 10 Christoph Cullmann 2015-06-14 21:23:02 UTC
Thanks for your patch!
(And thanks to Milian for the reviewing!)
Comment 11 Justin Zobel 2021-03-09 05:53:59 UTC
Thank you for the bug report.

As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists.

If this bug is no longer persisting or relevant please change the status to resolved.
Comment 12 Waqar Ahmed 2023-03-29 23:52:16 UTC
No longer reproducible, seems like no one closed this