Bug 468394 - r.xml syntax highlighting definitions miss specified integers (e.g., 3L)
Summary: r.xml syntax highlighting definitions miss specified integers (e.g., 3L)
Status: RESOLVED FIXED
Alias: None
Product: frameworks-syntax-highlighting
Classification: Frameworks and Libraries
Component: syntax (show other bugs)
Version: unspecified
Platform: unspecified macOS
: NOR minor
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-11 16:32 UTC by npearlmu
Modified: 2023-07-18 20:10 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
current rendering and potential new rendering (7.68 KB, image/png)
2023-07-15 01:15 UTC, Jonathan Poelen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description npearlmu 2023-04-11 16:32:08 UTC
SUMMARY

The R language has a base integer type (in addition to the base numeric for floating point); this is specified for a constant by adding an "L" after the integer (e.g., "3L").  Such constants are not highlighted by the current r.xml syntax highlighter, but should be.

DETAILS

The current r.xml syntax highlighter tags floating point and scientific notation numbers, and it flags integers as numbers if they are not marked with an "L".  But integers marked with an "L" are untagged and so cannot be highlighted except as "normal" text.

In the current r.xml definitions (language v13, Kate v5.0, name = "R Script"), lines 110 & 111 tag Float and Int values; what's needed is another rule, probably just before the Float one (line 110), to add an attribute (e.g., "MarkedInt") to L-marked integers.  There's probably something more efficient, but a RegExpr rule to match String="\b[0-9]+L" would do the trick.  And then, assuming the attribute applied by the rule isn't the same as that applied by the Int rule, an additional entry in the <itemDatas> section would be needed as well, to set the style for the new attribute (in my workaround version of the r.xml file, I assigned MarkedInt cases the "dsDataType" defStyleNum, as that style does not otherwise appear to be used for R syntax highlighting).

STEPS TO REPRODUCE

My only use of the KDE syntax highlighting definitions is in RStudio, when R code blocks are syntax-highlighted w/in pandoc-generated HTML, so my steps to reproduce this involve processing an R script containing code with a marked integer (e.g., "foo <- 6L") vs an unmarked one ("bar <- 6").  When such a script is run through rmarkdown and then pandoc, the latter applies HTML syntax highlighting styles based on the tagging from the r.xml file, and integers ended with an "L" have no HTML marking applied.  

I can provide a file and exact steps to reproduce if needed, but this bug should appear in any situation where integers marked with an "L" in R should be highlighted as if they were numeric constants.

Note that if I make the changes suggested above to r.xml and pass that revised file to pandoc (at runtime) as a customized version of the highlighting rules for R, I see correct highlighting.
Comment 1 Jonathan Poelen 2023-07-15 01:13:36 UTC
I tried with https://pandoc.org/try/ and the following code

```r
111
111L
111i
```

The last 2 lines are not in color. Except that this is not the behavior of KSyntaxHighlighting, for which only L and i are in Normal.

As the syntax version is also 13, I have the impression that this is on the side of skylighting (https://github.com/jgm/skylighting/), the library used by pandoc. This library contains a tool for converting to html or displaying in color on the console (ansi), so I think there will be the same problem as with pandoc. But for the moment I can't confirm this, as I don't have haskell or skylighting and can't install anything on my machine :/.

On the other hand, I'm thinking that only the suffix could be in a different color, something like in the attached image (on the left the current behavior, on the right what could be the new one).
Comment 2 Jonathan Poelen 2023-07-15 01:15:14 UTC
Created attachment 160295 [details]
current rendering and potential new rendering
Comment 3 Bug Janitor Service 2023-07-15 15:08:50 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/syntax-highlighting/-/merge_requests/506
Comment 4 Christoph Cullmann 2023-07-15 19:26:27 UTC
Git commit ef856a14e35194f0ee532326dfa5042adbeea251 by Christoph Cullmann, on behalf of Jonathan Poelen.
Committed on 15/07/2023 at 19:26.
Pushed by cullmann into branch 'master'.

R: add numeric suffix (i and L)

It doesn't seem to be a bug on our side, but I did make one change: highlighting suffixes.

I wonder if we shouldn't generalize and standardize this? There are several behaviors depending on the syntax:

There are several behaviors depending on the syntax:

- no treatment for suffixes, they often (always?) have the Normal color
- number and suffix have the same color, but the color of the same suffix can vary between Float, Int, Hex, etc.
- suffixes have a dedicated color

As shown in this commit, I'm in favor of the dedicated color and propose to generalize it to all syntaxes, what do you think?

M  +3    -0    autotests/folding/highlight.R.fold
M  +3    -0    autotests/html/highlight.R.dark.html
M  +3    -0    autotests/html/highlight.R.html
M  +3    -0    autotests/input/highlight.R
M  +3    -0    autotests/reference/highlight.R.ref
M  +7    -2    data/syntax/r.xml

https://invent.kde.org/frameworks/syntax-highlighting/-/commit/ef856a14e35194f0ee532326dfa5042adbeea251
Comment 5 npearlmu 2023-07-18 20:10:44 UTC
Thanks for taking a look at this, and sorry it's taken a few days to get back to you.  The suggested changes related to handling the suffix separately look OK to me, and are compatible w/my prefs for formatting of r code (all numerics -- Floats, Ints, complex, etc. -- get the same formatting, including any suffix); but I could certainly imagine cases/people/languages where the preferred behavior would be Christoph's 2nd one (e.g., color Floats and Ints differently and have the suffix match the number).

To follow up on Jonathan's original reply, though:  I don't have a way of tracking what's happening under the hood, nor of checking things other than in pandoc's html output (by way of RStudio's rmarkdown rendering process). You're saying that you don't see the problem in KDE using that same (version 13) version of r.xml, yes?  And that the latter doesn't involve use of skylighting?  So the problem is (probably) somewhere in the chain around pandoc's use of the skylighting library?

Because it's looking like the actual underlying bug is that the <Int attribute="Int" context="NumericSuffix"/> rule itself (line 115 in Christoph's r.xml commit) fails to apply when there's a suffix.  Using Christoph's updated (v14) r.xml still fails to highlight either the digits or the suffix when run through pandoc; but simply replacing the <Int ...> rule with a corresponding <RegExpr ... String="\b[0-9]+"/> rule works as intended.  As I understand things (based on the KDE docs at https://docs.kde.org/stable5/en/kate/katepart/highlight.html), the <Int .../> rule should just behave like a shorthand (but more efficient) version of a <RegExpr .../> rule with String="\b[0-9]+", so it seems like the problem is actually in the implementation of <Int ...> rules (in the skylighting library?).

I'm happy to try to report this as a bug appropriately, but I'm not sure where/how to do that (I originally got here from the skylighting readme).  I'm way out of my depth here, as I normally don't do anything directly with this stuff aside from calling the top-level rendering tool(s) in RStudio.