Bug 318888 - regexen in awk: update for syntax highlighter awk.xml
Summary: regexen in awk: update for syntax highlighter awk.xml
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: syntax (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR wishlist
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-25 23:53 UTC by steveL
Modified: 2014-09-08 14:49 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.11


Attachments
awk.xml (8.28 KB, text/plain)
2013-04-26 00:02 UTC, steveL
Details
apps/katepart/syntax/awk.xml -- coloured (10.40 KB, text/plain)
2013-04-27 23:23 UTC, steveL
Details
apps/katepart/syntax/awk.xml -- conformant (10.14 KB, text/plain)
2013-04-27 23:26 UTC, steveL
Details

Note You need to log in before you can comment on or make changes to this bug.
Description steveL 2013-04-25 23:53:55 UTC
I got a bit fed up with the lack of regex highlighting in awk, so added some functionality to the existing file.

It triggers on the awk match operator ~, eg: if ($1 !~ /re/)
and also at the start of a pattern, for: /foo/ { action } or: /foo/, /bar/ { range }

The latter is a bit trickier, since normally / is a division operator; that's why this is not complete as-is: we'd have to also deal with: $3 == "foo", /ere/ and regexen for defined functions like sub. The special casing is annoying, but you can see the same in the source for awk at http://www.cs.princeton.edu/~bwk/btl.mirror/ so
a) it's something we have to deal with.
b) it's not likely to change.

I thought I'd share the file as-is in the spirit of incremental improvement*, as it does make a major difference to scripting. It has support for POSIX character classes, eg: [[:alnum:]_]

I've just seen bug #318592 and that undocumented feature would make a big difference to a lot of highlighters imo. I've not used it yet, but in this case I think it might make the shenanigans around RegexPattern a little easier, especially if we want to add a distinction between / division and /ere/ more generally.

* ie: before I make a mess of the rest of it :)


Reproducible: Always
Comment 1 steveL 2013-04-26 00:02:16 UTC
Created attachment 79452 [details]
awk.xml

apps/katepart/syntax/awk.xml
Comment 2 Dominik Haumann 2013-04-27 10:13:03 UTC
Git commit c9e3cc9ef9f99d922f1ebd987c74ba572992cdbd by Dominik Haumann.
Committed on 27/04/2013 at 12:12.
Pushed by dhaumann into branch 'master'.

fix regexp hl in awk

Comparison: http://wstaw.org/m/2013/04/27/plasma-desktoppw3590.png
left: old, new: right

Thanks to slong for the patch.

FIXED-IN: 4.11

M  +4    -0    examples/syntax/highlight.awk
M  +124  -27   part/syntax/data/awk.xml

http://commits.kde.org/kate/c9e3cc9ef9f99d922f1ebd987c74ba572992cdbd
Comment 3 Dominik Haumann 2013-04-27 10:16:28 UTC
I've committed your changes. Although, it's generally a bad idea to hard-code colors. That's after all why we have the default styles.

So in all, we'd appreciate if you send another patch that removes as much hardcoded colors as possible again.
Comment 4 steveL 2013-04-27 23:23:10 UTC
Created attachment 79502 [details]
apps/katepart/syntax/awk.xml -- coloured

Great, thanks. Attaching two versions of new variant, that allows for computed field references, eg var = $(f + 1) this one with colouring, and one as you requested without. Also started to make a distinction between code and expression space, which is something we used very successfully in a recent internal project. For awk, it means warning on eg: 'if' or 'for' inside a bracket. This is more intended to move to handling special functions like getline, print and printf which can be tricky because of the well-known syntax ambiguity with redirection (besides not requiring brackets.)

wrt to the hardcoding of colours, they are close to the default colour eg for a string, but slightly different so that the user knows that there /is/ a variation: as it is there are 4 styles derived from dsString, and 4 from dsNormal. I've often found in the past that unless there's a visual indication that they're different, it's easy to think the highlighter doesn't actually do much, and overlook it.

The user can still change those styles as they see fit: it just doesn't occur to you unless they are differently highlighted in some fashion or another. Obviously someone with more experience in kate, will know to see what settings are available: but then they already know how and where to change the colouring. My intent was to grab the attention of the new or inexperienced user, so that they see the merit in using kate. Additionally an experienced user who sees kate still highlighting awk the same way (since all the styles use the same default as before with no variation), will probably just assume it's still got the same setup, and that nothing's changed.

That was my take on it, in any case :)

I didn't want to change to using another default style, as it ties in better with indentation scripting. I guess we could use dsOthers for Regex, since that is not considered code either, by document.isCode(); atm I'm using that for "Regex Op" in a similar vein to C chars. But I guess dsString for String and Escape, and dsOthers for all regexen would work, even if it's not logically so apt eg for CharClass.
Comment 5 steveL 2013-04-27 23:26:18 UTC
Created attachment 79503 [details]
apps/katepart/syntax/awk.xml -- conformant

uncoloured variant.
Comment 6 Dominik Haumann 2013-10-16 21:16:51 UTC
SteveL: validating awk.xml against langauge.dtd reports several problems:

$ ./checkdtd awk.xml                                                                                                                                                                                                                                                
awk.xml:97: element context: validity error : Element context does not carry attribute lineEndContext                                                                                                                                                                                            
awk.xml:97: element context: validity error : Element context does not carry attribute attribute
awk.xml:101: element context: validity error : Element context does not carry attribute lineEndContext
awk.xml:101: element context: validity error : Element context does not carry attribute attribute
awk.xml:106: element context: validity error : Element context does not carry attribute lineEndContext
awk.xml:106: element context: validity error : Element context does not carry attribute attribute
awk.xml:121: element context: validity error : Element context does not carry attribute lineEndContext
awk.xml:121: element context: validity error : Element context does not carry attribute attribute
awk.xml:129: element context: validity error : Element context does not carry attribute lineEndContext
awk.xml:129: element context: validity error : Element context does not carry attribute attribute
awk.xml:173: element context: validity error : Element context does not carry attribute lineEndContext
awk.xml:173: element context: validity error : Element context does not carry attribute attribute
Document awk.xml does not validate against language.dtd

Can you look into this and provide a new version?
Comment 7 steveL 2013-10-17 15:55:44 UTC
Hi Dominik,

 Yeah I'm well aware of this: I use the XML Validation plugin in Kate to check all my
syntax highlighters. Is this the reason my awk code suddenly looks crap in Kate? ;-)
Was wondering about that hehe, but been too busy to investigate (and not doing much awk atm, just bits of bug-fixing.) The problem I have with this is twofold: firstly the DTD is massively out of date and has been for several years. Hmm just checked it and it looks very different: I don't recall all those comments, but that may just be senility..

Checking git I see it has been worked on quite a bit over last few years, so if you're telling me validation is mandatory, then I'll fix it. However let me first explain why it doesn't validate already.

Those errors are all included Contexts, so afaict Kate pays no attention to those attributes; certainly setting them has never had an effect, and their absence has never hurt (no warnings or errors on startup, and the highlighter works.) When I started on the highlighting work (2008 iirc) the DTD was pretty useless, since most things I found useful weren't even in it (which I think is why so many people copy and paste from the bash highlighter, which lovely as it is, is awfully inefficient.) I had to dig through lots of different highlighters to find what was available, and still missed stuff.

To my enduring shame, I hadn't found the documentation in the Handbook, and didn't see it til someone (I think it was you or ehamberg, after someone else in #kate asked for documentation) mentioned it a year or so ago. Even then ISTR there were things that weren't in the DTD; so my practise is just to validate against the DTD, and ignore those errors: they're the *only* ones I do ignore.

The reason for that is that I kept getting errors about unknown attributes and elements (iirc): so I'd skip those too, *after* checking the relevant line. In that context, it makes no sense to worry about trivial errors that don't affect anything, since you're going to get errors whatever you do.

I gave up on the make one because of that line pop snafu, and I have not had that issue with the awk highlighter at all; I thought because awk is a much simpler language, and doesn't require anything like as much contexts since it doesn't nest another language within it, using completely different contexts to everything else available.

You actually have a doubling of the latter if you want to deal with gmake $(shell ..) properly. That's where I got my mania for included contexts; now I use them because they make the code cleaner, and afaict more efficient. If they weren't, I'd still use them, as speed has never been an issue, assuming the stacking works: and I tend to write what I'm told are quite efficient highlighters (now at least;) since I work with parsers and hand-written lexers.

I just scan the output and if there's anything that's not exactly as above, I dig in and fix it: you'll see those are all paired per-line, which is the pattern I look for. Then I try it out in kate; unless it's a trivial edit, in which case I just do it and use it: for those I'm always watching what's changed (which is where Debug and Warning styles come in handy, tho Error is the best for getting me upset;) since it's code I'm working on and the highlighter isn't cutting it, sufficiently for me to get annoyed and start my Syntax session.

Hmm that was a lot longer than I intended; executive summary: it's cleaner to work with (easier to spot an included Context when scanning), it doesn't affect anything in kate, and it loads more quickly.

But as I said, if it's mandatory I'll stick a load of empty attributes in; I'll do that with the newer one, since it's about 2.5 times longer, and *much* nicer to work with, *so long* as you tell me the above reasoning is not sufficient to allow it to be used as-is; the bigger they get the more cruft we can avoid, and personally I'd rather take the free optimisation; I'll even write you a script to filter out the above matched pairs for Cmake verification (a bit like yacc expected errors: if strictness is required then I could just give it an expected number.)

Your call; ideally we'd have a way of telling it to ignore those two missing on included Contexts. Hmm that could be scripted easily enough..

BTW, can we PLEASE have the git repo url: https://projects.kde.org/projects/kde/applications/kate/repository/revisions/master/show/kate
in #kate /topic ? I've mentioned it *so* many times in the channel over last 2 years, and no-one ever says anything; the access list is empty, and just has freenode staff in it, so I have no idea who I should be asking. I've seen it work really well with other projects, and it means people who are interested can dive straight in and find something to patch.

I'd also like to think we (well, you guys) are proud of kate, and *want* to show the code off, as well as the end-result, which people tend to take for granted, since they don't see the work that goes into it, but just complain about how it doesn't read their minds;) I've lost track of the number of people who ask for things and don't try out the js or C++ that I indicate to them. Let's get the enthusiasts when they first arrive, and get them straight into code, rather than tedious discussion, essentially.

One last thing: I'd like some feedback on the points I raised about the conformant vs coloured highlighter, specifically:

The user can still change those styles as they see fit: it just doesn't occur to you unless they are differently highlighted in some fashion or another. Obviously someone with more experience in kate, will know to see what settings are available: but then they already know how and where to change the colouring. My intent was to grab the attention of the new or inexperienced user, so that they see the merit in using kate. Additionally an experienced user who sees kate still highlighting awk the same way (since all the styles use the same default as before with no variation), will probably just assume it's still got the same setup, and that nothing's changed.

I know I do as a rule: I don't have time to keep checking what's available, every time something changes, and so it just doesn't occur to me when I'm working, which is what I use kate for.
Granted I can try to change that habit, but I really don't think most users will: I still haven't and I've been working with the xml for 5 years, on and off.

So *not* colouring slightly differently is to my mind a bad idea: if the user hasn't set colours, everything works fine. If they have, they only see a different colour than the ones they have setup when they download a highlighter with a *new* style. 

Certainly none of my pre-configured colours has ever changed, and I'd file a bug immediately if they did. If I'm wrong on the latter, due to downloaded files changing my setup, and my use of colours close to the defaults which I like (assuming there is a style that has a similar meaning), then fine: I'll file that bug.

I don't think there's a case for more default styles; the most I'd do is change what Document.isCode() says is code, which til now I've been doing with customised .js; I believe I gave you that code already.

Thanks for your time,
igli.
Comment 8 Dominik Haumann 2013-10-17 16:55:38 UTC
Git commit 4dfbfec3b10cf664cf442f5df23c8059fc5d9bca by Dominik Haumann.
Committed on 17/10/2013 at 16:55.
Pushed by dhaumann into branch 'KDE/4.11'.

XML tools plugin: update language.dtd.xml

This update Kate's MetaDTD used by the XML completion plugin.
This file was outdated by 11 years (updated in 2002). That means
that people using this plugin worked with completely outdated
highlighting rules... This is totally awful..

Further infos:
http://new.linuxfocus.org/English/May2002/article201.shtml#201lfindex2

CCMAIL: kwrite-devel@kde.org

M  +1174 -495  addons/kate/xmltools/language.dtd.xml

http://commits.kde.org/kate/4dfbfec3b10cf664cf442f5df23c8059fc5d9bca
Comment 9 Dominik Haumann 2013-10-17 16:57:14 UTC
Git commit 5ce2aa29f17a0b6be35ebeb31ea82e475cc8ddc3 by Dominik Haumann.
Committed on 17/10/2013 at 16:55.
Pushed by dhaumann into branch 'master'.

XML tools plugin: update language.dtd.xml

This update Kate's MetaDTD used by the XML completion plugin.
This file was outdated by 11 years (updated in 2002). That means
that people using this plugin worked with completely outdated
highlighting rules... This is totally awful..

Further infos:
http://new.linuxfocus.org/English/May2002/article201.shtml#201lfindex2

CCMAIL: kwrite-devel@kde.org

M  +1174 -495  addons/kate/xmltools/language.dtd.xml

http://commits.kde.org/kate/5ce2aa29f17a0b6be35ebeb31ea82e475cc8ddc3
Comment 10 Christoph Feck 2013-11-01 09:30:05 UTC
Should this be marked as fixed in 4.11.3?
Comment 11 Christoph Cullmann 2014-09-08 14:49:38 UTC
Yeah, fixed.