Bug 479936 - Ability to track additional state via custom state variables
Summary: Ability to track additional state via custom state variables
Status: RESOLVED INTENTIONAL
Alias: None
Product: frameworks-syntax-highlighting
Classification: Frameworks and Libraries
Component: framework (show other bugs)
Version: unspecified
Platform: openSUSE All
: NOR wishlist
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-17 10:54 UTC by Ellie
Modified: 2024-04-03 03:06 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ellie 2024-01-17 10:54:13 UTC
SUMMARY

For context-sensitive grammars, I find it can be fairly difficult to track all necessary context (in the theoretical sense) through the actual context constructs (the KDE syntax highlight "context" item) because there can only ever one active at once, from what I can tell. This is severely limiting whenever there's a more complex kind of theoretical context/state to be tracked on multiple layers, which can be necessary sometimes.

I therefore propose that rules can set named state variables tracked by the syntax engine, as well as have preconditions depending on them:

    <keyword attribute="DeclKeyword" context="NamedItem"
         String="namedcodeblockitemkeywords" state-set="expect_block = 1" />
    <DetectChar attribute="OtherSymbol" context="#stay"
         char="{" state-check="expect_block == 1" />

(I chose this syntax so at some point it could be extended for example to += or -= for the set part, or >= for the check part, or whatever. I think as a start, just = for setting and ==/!= for checking should be enough.)

My apologies if this feature already exists and I just missed it.

STEPS TO REPRODUCE

1. Encounter a situation like this: "i'm looking to write a kate syntax highlighting ( https://invent.kde.org/frameworks/syntax-highlighting ) but i ran into a problem: in the language i'm writing it for, `{` opens a code block and should therefore be a folding region only if it's the first `{` to follow a specific list of "code block keywords" that fulfills all of the following conditions: 1. it's not nested inside any other `{` or `(` or `[` bracket pairs opened after that code block keyword, and 2. the last non-whitespace character before the `{` bracket it isn't part of  specific negative symbol list, and 3. it isn't before potential in-between whitespace preceded by a keyword of a specific negative keyword list. i might have missed some corner cases but i'm fairly sure this would basically be always correct"

2. Realize it would be way easier to track if "code block start keywords" could enter some code block expecting state, and different "code block stop keywords" could exit it again. Same for "code block allowing symbols" and "code block preventing symbols", and this should be fairly quick to put together as a result, at least outside of the nesting part.

OBSERVED RESULT

It's too complicated right now to handle many variants of state (=context in grammars that require it) if there's more than one layer of that required.

EXPECTED 

With the proposed addition, it should hopefully be fairly easy to track such additional state.

SOFTWARE/OS VERSIONS

probably all are affected

ADDITIONAL INFORMATION
Comment 1 Christoph Cullmann 2024-01-17 15:57:47 UTC
You can do the same with context switches, even if that naturally is not that nice. I don't see us adding additional vars, that will require tracking them which is costly and in addition introduce yet another level of complexity.

Given even complex languages like C++ work ok with the current level of expressiveness I don't see us going there.

If you have time to provide a merge request that shows this can be added without a large impact, one might reconsider.
Comment 2 Ellie 2024-01-17 16:10:34 UTC
I don't see how without running into scalability issues, doesn't that require basically an exponential explosion of context constructs? E.g. for 3 independent states, 27 context constructs already? Unless you expect me to write a compiler to output the syntax formatting and no longer be able to write it myself, I don't see how that's feasible. Therefore, the reasoning that it should be expressible seems somewhat theoretical and not really true in practice as soon as the grammar is more context-sensitive than usual. I would assume C++ to be less context sensitive because it has line terminators, while what I'm working on doesn't.
Comment 3 Ellie 2024-01-17 21:17:58 UTC
Is there some good place to ask how to best do the actual implementation of the XML? I'm slightly at a loss how to do it without turning into kind of a chaos. While with this feature, it would be fairly easy and clean.
Comment 4 Ellie 2024-01-17 21:48:19 UTC
I currently have these contexts or plan to add them, that would turn into a problem (I use these to mark variable names that are declared with dsVariable which seems very helpful to make the names easy to spot, and I might add more later):

NamedItem, WithStatementBeforeIn, WithStatementAtLabel.

If I want this feature to work I think I would need to manage it with these new contexts, something like this:

CodeBlockAtNextBrace, CodeBlockDeferred.

Both would need to be able to coincide with the first list, so this would give me:

NamedItem, NamedItemCodeBlockAtNextBrace, NamedItemCodeBlockDeferred, WithStatementBeforeIn, WithStatementBeforeInCodeBlockAtNextBrace, WithStatementBeforeInCodeBlockDeferred, WithStatementAtLabel, WithStatementAtLabelCodeBlockAtNextBrace, WithStatementAtLabelCodeBlockDeferred.

I hope this shows why I anticipate this turning into a bigger problem. It doesn't seem like a sustainable approach.
Comment 5 Ellie 2024-04-03 03:06:52 UTC
I'm still stuck on this, for what it's worth. Would be curious to hear if there's any good place to ask about such questions to figure this out!