SUMMARY Product/Component unknown; sorry. Observed in kate and konsole, so probably affects something they both depend on. STEPS TO REPRODUCE (kate) 1. In kate, enter an emoji, e.g. 😊 2. Move cursor back and forth from before to after cursor. 3. Look at line:column indicator at bottom. OBSERVED RESULT (kate) Column jumps by two for a single character. EXPECTED RESULT (kate) Column should increase by only one per character. STEPS TO REPRODUCE (konsole) 1. In kate, copy and paste the emoji until you have OVER 4000 (e.g. 4001). (Remember that the column number will say 8003 at the end of a line with 4001 emojis.) 2. Select them all and copy to clipboard. 3. In konsole, run 'python3'. Then type: len(""" 4. Press Ctrl+Shift+V (or go to Edit, Paste; or right-click and select Paste). OBSERVED RESULT (konsole) It will ask you if you want to paste X number of characters (e.g. 8002) instead of the correct number (e.g. 4001). Answer 'yes'. Then complete the python expression with: """) and hit enter. The correct number of characters (e.g. 4001) is displayed. EXPECTED RESULT (konsole) It should count the characters correctly, not double-count them. SOFTWARE/OS VERSIONS Kubuntu 22.10 KDE Plasma Version: 5.25.5 KDE Frameworks Version: 5.98.0 Qt Version: 5.15.6 Kate 22.08.2 Konsole 22.08.2 ADDITIONAL INFORMATION For casual users, the number of characters may not really matter, but for people like me who do programming or work on data projects, I need to know correct character counts, and not be wondering where did X number of characters go or where did X number of characters magically come from. If it's a single Unicode code point (e.g. U+1F60A) then it needs to be treated as just one character, regardless of how many bytes it might require to encode in a particular encoding. The whole point of working with text instead of bytes is that you can work with characters, not worrying about how things are encoded under the hood.
These are going to end up being individual bugs in each app, not something more general. Arbitrarily using this one for Kate; please file another for Konsole. Thanks!
Can reproduce in Kate.
Fixing it consistently throughout KTextEditor/Kate is considered out of scope/hard to do in all the places. See https://invent.kde.org/frameworks/ktexteditor/-/merge_requests/533 for reasoning.
Just to have some reasoning here: We use everywhere indices into UCS2 strings as columns. If we compute search matches, we use that, in the internal api we do that, e.g. for --column we do that. It would be a major effort to alter that and I don't see that it makes sense to spend our time on that. The cursor movement on editing is correct, that would be some issue, but that for rare characters the column offset is not as expected is IMHO no big issue.