Bug 439845

Summary:	Ctrl+Arrow stop at random places in the middle of words
Product:	[Applications] kate	Reporter:	php4fan
Component:	general	Assignee:	KWrite Developers <kwrite-bugs-null>
Status:	RESOLVED NOT A BUG
Severity:	normal	CC:	christoph, jpmbatrina01, waqar.17a
Priority:	NOR
Version First Reported In:	21.04.2
Target Milestone:	---
Platform:	openSUSE
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description php4fan 2021-07-14 17:06:47 UTC

SUMMARY


STEPS TO REPRODUCE
1. Create a new file
2. Type the text: "EXTRA2" (without the quotes)
3. Place the cursor at the beginning of the text, i.e. before the E
4. Press Ctrl+Right (Ctrl key and the right-arrow key)

OBSERVED RESULT

the cursor goes to before the A

EXPECTED RESULT

Should go to the end of the word. According to how you define that, it could be reasonable to either jump to before the 2 (considering that a word ends at a boundary between letters and numbers), or more traditionally, after the 2, considering that a sequence of letters and numbers with no other characters is a word. You can argue in favor of either option, but jumping to a random place within a sequence of all-letters makes no sense.


SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20210612
KDE Plasma Version: 5.22.0
KDE Frameworks Version: 5.82.0
Qt Version: 5.15.2
Kernel Version: 5.12.9-1-default (64-bit)
Graphics Platform: X11
Processors: 8 × Intel® Core™ i7-1065G7 CPU @ 1.30GHz
Memory: 7.3 GiB of RAM
Graphics Processor: Mesa DRI Intel® Iris® Plus Graphics

ADDITIONAL INFORMATION

Comment 1 Waqar Ahmed 2021-07-14 19:11:26 UTC

This is a feature known as camel case jumps and is useful when you are programming. Of course it is not for everyone and can't satisfy all needs.

There is a setting for it under "Editing", where you can disable it.

Comment 2 php4fan 2021-07-15 14:35:12 UTC

What exactly is the feature??

How is jumping to the middle of an all-upper-case word related to camel case?

Can you give an example of the usecase it is supposed to address?

Comment 3 php4fan 2021-07-15 14:43:05 UTC

Ok so if I have:

   FooBarLoremIpsum

and I use ctrl+arrows, it will jump from before the F, to before the B, to before the L, then the I.

If I have:

   FooBarLoremIpsum123

the same, because "Ipsum123" is seen as a word.

If I have:

   FooBarLoremIpsumX123

"X123" will be considered as one word.

So far so good.


But if I have:

    IPSUM123

how on earth is IPSU one word and M123 the next word? That makes no sense, and I *am* programming.

Comment 4 Jan Paul Batrina 2021-07-15 16:04:10 UTC

(for the examples below, | denotes the cursor position)
> What exactly is the feature??

As a crude example, maybe I have a call to FunctionNameSomeLong|, but the correct name should be SomeLongFunctionName. Instead of using the mouse  (or going character-by-character) to select "SomeLong", I can just Ctrl+Shift+Left 2 times, Ctrl+x, Ctrl+Left 2 times, then Ctrl+v. Basically it makes editing and navigation for CamelCase codebases easier.

> How is jumping to the middle of an all-upper-case word related to camel case
> Can you give an example of the usecase it is supposed to address?

If I have a class named |LOREMDialog that I want to rename to IPSUMDialog, I can just press Ctrl+Shift+Right (resulting to |LOREM|Dialog) then type IPSUM instead of manually selecting "LOREM".

Thus, in case of consecutive uppercase letters (e.g. an acronym), the feature does not include the last uppercase letter in the sequence since it belongs to the next work. After all, ACRONYMWord is CamelCase, not ACRONYMword.

Since A is the last uppercase letter in EXTRA, and M is the last uppercase in IPSUM123, the feature treats them as the start of the next word.

This makes the feature not random (as you put it), but I agree that the behavior for uppercase letters and numbers is a bit unexpected. I partially prefer the first option you proposed (e.g. IPSUM123 should be separated as IPSUM and 123), but discussions with the maintainers for the exact behavior would be needed.

If you have the time, patches are welcome! Otherwise, you'll have to wait for a bit for someone else to do so, especially since the Kate/KTextEditor team is small.

Thank you for the bug report and have a nice day!

Comment 5 php4fan 2021-07-15 17:09:04 UTC

> I agree that the behavior for uppercase letters and numbers is a bit unexpected

Not "a bit unexpected", just wrong.

Apparently the criterion for an uppercase letter to be considered the start of a new  camel-cased word is to be "followed by an alphanumeric character that is not another uppercase letter" (note that IPSUM is not split into IPSU+M), while instead it should be to be "followed by a lowercase letter". (I'm not trying to reverse-engineer the actual algorithm, I'm sure there's more to it, just loosely generalizing the behavior description based on the examples.)

> I partially prefer the first option you proposed (e.g. IPSUM123 should be separated as IPSUM and 123)

That would be inconsistent with the fact that, currently, "ipsum123" (lowercase) is NOT split. Either you split both (ipsum123 => "ipsum, 123" ; IPSUM123 => "IPSUM, 123") or you split neither. I strongly advise against separating sequences of letters from sequences of numbers, as it results in "a bit unexpected" behaviors with hexadecimal numbers and similar hashes.

Comment 6 Christoph Cullmann 2021-07-15 17:42:45 UTC

Hi,

first: yes, the behavior can be improved for the case you have shown.

Beside this: as said, one can deactivate this variant of the cursor movement and you are back to the "good old times".

Actually my proposal would be to just not split full lower or upper case words with numbers at all.

e.g.

xxxxxx121212
XXXXXX123123
123123xxxxxx123123
123123XXXXXX123123

would all be just "one chunk" for the movement.

Btw., whereas such improvement wishes are appreciated, I think it would be nicer if you could avoid just repeating that you think the behavior is "wrong". Any such feature is a heuristics, there is no 100% precise definition how it should behave. We can agree that one can improve this thought.

We work on this in our free-time and strong language/opinions won't help you to motivate us to work on this.

Btw., if you are a programmer as mentioned, patches are really welcome.

Just search for CamelCursor in ktexteditor.git, we even have unit tests for this feature.

Comment 7 Waqar Ahmed 2021-07-15 18:47:24 UTC

Hi all,

As the person who implemented this, I followed what QtCreator did, as best as I could and all the test cases and implementation was checked against behavior in QtCreator.

For the all caps + num (ABC123) case, will need to see what QTC does. And maybe it can be fitted alongside, provided it doesn't break any of the other cases.

The reason behind the behavior is best explained by Jan Paul Batrina already. That is exactly what happens.

Also, while ABC123 and other gibberish are valid identifiers, I would prefer if the behavior is tweaked for things that you would more commonly see. All cap identifiers with numerics are a rare thing, at least in my experience.

Comment 8 Waqar Ahmed 2022-01-22 13:44:47 UTC

Closing this as the only bugs I will take against camel case behaviour will be the ones where we diverge from QtCreator behaviour or have totally unreasonable cursor jumps. The reason for this is that allows us to have some defined way of how things are supposed to work. And why it works the way it works is also explained, but it will never satisfy 100% of all cases, because that is simply not possible.