Bug 147413 - New syntax highlighting file for language D
Summary: New syntax highlighting file for language D
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: syntax (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-01 11:02 UTC by Jari-Matti Mäkelä
Modified: 2010-09-10 20:36 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
New d.xml file (17.74 KB, text/plain)
2007-07-01 11:03 UTC, Jari-Matti Mäkelä
Details
Minor update (18.62 KB, text/plain)
2007-07-01 14:38 UTC, Jari-Matti Mäkelä
Details
New version. DTD valid. (20.00 KB, text/html)
2007-07-21 16:05 UTC, Aziz K.
Details
Another new version with fixes and improvements. (21.04 KB, text/html)
2007-07-24 12:18 UTC, Aziz K.
Details
A few updates. (22.09 KB, text/plain)
2007-09-10 16:10 UTC, Diggory Hardy
Details
An old test suite (10.43 KB, text/plain)
2007-09-11 22:10 UTC, Jari-Matti Mäkelä
Details
New version (23.43 KB, text/xml)
2007-09-17 15:43 UTC, Aziz K.
Details
new version, with ddoc highlighting (7.79 KB, application/x-tgz)
2007-10-09 13:09 UTC, Diggory Hardy
Details
d.xml, ddoc.xml, sample highlighting file (.tgz archive) (7.53 KB, application/x-tgz)
2007-10-10 12:02 UTC, Diggory Hardy
Details
Updated ddoc.xml to 1.1.1 (7.90 KB, text/plain)
2007-10-11 17:43 UTC, Diggory Hardy
Details
ddoc.xml 1.1.2 - bugfix (7.98 KB, text/plain)
2007-11-05 16:00 UTC, Diggory Hardy
Details
d.xml 1.47 & ddoc.xml 1.13 (7.28 KB, application/x-tgz)
2007-11-15 19:04 UTC, Diggory Hardy
Details
d.xml 1.48 BETA & ddoc.xml 1.13 (9.04 KB, application/x-tgz)
2007-12-13 19:24 UTC, Diggory Hardy
Details
d.xml 1.49 & ddoc.xml 1.13 (9.77 KB, application/x-tgz)
2008-01-06 16:38 UTC, Diggory Hardy
Details
d.xml 1.51 & ddoc.xml 1.14 (10.36 KB, application/x-tgz)
2008-02-22 13:41 UTC, Diggory Hardy
Details
Specify that variables can start with an underscore (32.72 KB, text/xml)
2008-09-17 08:53 UTC, Erlend Hamberg
Details
d.xml 1.61 (ddoc highlighting rolled in) (40.58 KB, application/xml)
2008-09-22 12:28 UTC, Diggory Hardy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jari-Matti Mäkelä 2007-07-01 11:02:33 UTC
Version:            (using KDE KDE 3.5.7)

Some fixes and updates to the syntax highlighting. I'll attach an up-to-date version of the file here.
Comment 1 Jari-Matti Mäkelä 2007-07-01 11:03:38 UTC
Created attachment 20999 [details]
New d.xml file
Comment 2 Jari-Matti Mäkelä 2007-07-01 14:38:30 UTC
Created attachment 21002 [details]
Minor update

Please use this instead. Added some minor fixes.
Comment 3 Aziz K. 2007-07-21 16:05:38 UTC
Created attachment 21216 [details]
New version. DTD valid.

This is an updated version of the previous d.xml file.
Comment 4 Dominik Haumann 2007-07-21 19:34:50 UTC
Binary/Octal/Hex/Integer. There are dedicated rules to do that instead of regular expressions. Is that really necessary? Why not do it like in cpp.xml for the numbers? :)
Comment 5 Aziz K. 2007-07-21 20:24:09 UTC
They are necessary because in D number literals can have embedded underscores. If this wasn't the case then I would have used the predefined "HlCxxx" rules for matching numbers.
Comment 6 Aziz K. 2007-07-21 20:30:54 UTC
As for the multiline comments, you can delete <comment name="multiLine"  start="/*" end="*/" region="CommentBlock"/>. Currently kate uses /+ +/ to comment out multiline sourcecode and it's better when it stays that way, because they can be nested (in contrast to /* */).
Comment 7 Aziz K. 2007-07-21 20:33:34 UTC
This line can be deleted (in context ModuleName):
<!--         <DetectChar   attribute="Normal Text" context="#pop"          char=";"/> -->
Comment 8 Aziz K. 2007-07-24 12:18:53 UTC
Created attachment 21235 [details]
Another new version with fixes and improvements.
Comment 9 Diggory Hardy 2007-09-10 16:10:10 UTC
Created attachment 21588 [details]
A few updates.

I made some changes (to version 1.44).
I noticed 0_ was accepted by the octal regexp, wierd since the octal and binary
versions checked against this type of thing.
I wrote some new floating-point regexps which are horribly complex but should
do the trick pretty accurately. I went by what gdc 0.24 supports since the spec
isn't quite accurate enough for this; hope it's not too different from dmd. So
0xp0, 0x.p0 are matched, 0xp_0 aren't. Also, as was previously the case, 0f is
matched, although it's not valid according to the spec. Contact me if you want
me to re-write any of this.
Comment 10 Diggory Hardy 2007-09-11 12:20:38 UTC
Forgot to change version in <language ...> tag.
Comment 11 Aziz K. 2007-09-11 21:42:01 UTC
Hello Diggory,

I've looked at your changes and I propose the following for matching floating point numbers:

// The One Regexp that will match all float numbers. Took me quite a while to figure out :)
(\d[\d_]*\.(?!\.)[\d_]*|\.\d[\d_]*)([eE][-+]?\d[\d_]*)?[fFL]?i?|\d[\d_]*([eE][-+]?\d[\d_]*[fFL]?i?|([fF]i?|[fFL]?i))|0[xX][\da-fA-F_]*\.?[\da-fA-F_]*[pP][-+]?\d[\d_]*[fFL]?i?

// Commented version
(
  \d[\d_]*\.(?!\.)[\d_]*|    # Float with dot after integer
  \.\d[\d_]*                 # Float starting with dot
)
([eE][-+]?\d[\d_]*)?[fFL]?i? # Optional exponent and suffix
|
\d[\d_]*
(
  [eE][-+]?\d[\d_]*[fFL]?i?| # Float with non-optional exponent
  ([fF]i?|[fFL]?i)           # Integer float suffixes
)
|
0[xX][\da-fA-F_]*\.?[\da-fA-F_]*[pP][-+]?\d[\d_]*[fFL]?i? # Hex float with decimal exponent
// End of commented version

I haven't tested if it really works. It would be really great if we had some sort of a "unit test" which could be used to see if the changes made to the regexps have caused any regressions or new errors.

Would you maybe be willing to write such test cases and include them in the xml file?

Here is something for the start:

0.0 // float
0..0 // int slice int
0...0 // int ellipses int
0....0 // int ellipses float (.0)
0.....0 // int ellipses slice int
0......0 // int ellipses ellipses int
0.......0 // int ellipses ellipses float (.0)
0ULi // unsigned long and identifier "i"
0_L // long octal
0_UL // unsigned long octal
0x0U // unsigned hex
0xp0 0x.p0 0x0p2 // Hex floats
0_e2 // Float
0u 0U 0uL 0UL 0L 0LU 0Lu // Integers
0_F 0_i 0_Fi 0Li 0f 0F 0fi 0Fi 0i // Int floats
0b_1_LU // unsigned long binary
0b1000u // unsigned binary
0x232Lu // unsigned long hex

Please try to think of more unusual and normal cases and don't forget to document them.

You've changed the octal regexp. Please let it match 0_ as it is a valid octal number. Here is an improved, simplified version: "0[0-7_]+(L[uU]?|[uU]L?)?"

Regards,
Aziz
Comment 12 Jari-Matti Mäkelä 2007-09-11 22:10:46 UTC
Created attachment 21609 [details]
An old test suite

I forgot to mention I got this recently. It could be use as a basis for more
work.
Comment 13 Dominik Haumann 2007-09-12 17:06:17 UTC
SVN commit 711675 by dhaumann:

commit update from Comment #9.
If you have newer stuff, add it to this report and reopen.
BUG: 147413


 M  +216 -98   d.xml  


WebSVN link: http://websvn.kde.org/?view=rev&revision=711675
Comment 14 Aziz K. 2007-09-17 15:43:53 UTC
Created attachment 21638 [details]
New version

This is a new version which solves the issues with number literal matching. It
also fixes some other problems.
Comment 15 Diggory Hardy 2007-09-18 13:08:43 UTC
Well, I was really experimenting with my changes and seeing what people thought.

I just had a brief look at the regexp differences; but I only looked at the hex regexp thoroughly: use the following; it's a merge of yours and mine (if there isn't a . then all [\da-fA-F_] characters have been matched, so don't waste time trying to match more).
0[xX][\da-fA-F_]*(\.[\da-fA-F_]*)?[pP][-+]?\d[\d_]*[fFL]?i?

As for the non-hex floats, I spent quite a while designing my regexps and they should be both highly-accurate and fairly optimised. (They could have been shorter by using look-aheads instead of branching with | symbols, but that would have meant more checking than necessary.) So I'd recommend you use those (from 1.44) if you'll trust me there (or them together if you like; shouldn't really make a difference although I don't know why you had program lock-ups).

Didn't realise 0_ would be octal. I guess the spec allows it to be, but it doesn't really make a difference.

That test file you uploaded has a few oddities:
  character literals containing strings (meant to catch these out?)
  using l as a suffix for numbers (according to the spec, this specific suffix should only be capital; gdc gives a warning about it too)
  a float starting ._6 - the compiler cannot accept this as a float because it is recognised as an identifier (_6...) at module scope first. Note that this is the only reason numbers can't start with an _ AFAIK, so numbers with a prefix can start with an _ following the prefix
  array ranges where both values have the same base - surely the compiler doesn't care whether or not each value has the same base anyway?
And it misses:
  hex numbers/floats can legally start 0x_ or 0x._, I think even 0x_._p_ is a valid float.

Hope you don't mind my rambling on...
Comment 16 Jari-Matti Mäkelä 2007-09-18 14:19:09 UTC
> That test file you uploaded has a few oddities: 

The file is a bit old and probably buggy. I just thought it might have some reusable parts for someone who wants to improve it.

> That test file you uploaded has a few oddities: 
>   character literals containing strings (meant to catch these out?) 

It should probably use the error style for these.

> using l as a suffix for numbers
> a float starting ._6

These seem to be bugs (l is a deprecated feature according to DMD - I don't remember when it was deprecated though, but it makes sense).

> hex numbers/floats can legally start 0x_ or 0x._,

Yes.

> I think even 0x_._p_ is a valid float. 

Yes, according to the spec. DMD however doesn't accept _ in the exponent part of any numbers.

> Hope you don't mind my rambling on... 

Not at all. I at least think decisions should base on technical merits. What makes this a bit difficult is that the official spec has some oddities and bugs, and behaves differently than the compiler implementations in some cases. I thank you for your work so far and reopen this now.
Comment 17 Diggory Hardy 2007-09-21 21:02:05 UTC
Suggestion:
Within comments, highlighting keywords such as FIXME, NOTE, FUTURE (those are suggestions) as with C++ highlighting (using the Alert style).
Also, possibly try matching comment lines used to separate regions of code. Trouble is, different methods are used, maybe such as these:
// --- section ---
// BEGIN section
// END section
// FUNCTIONS
// DATA
Comment 18 Diggory Hardy 2007-10-09 13:09:28 UTC
Created attachment 21780 [details]
new version, with ddoc highlighting

OK, I've made a few small changes to d.xml and created a ddoc.xml syntax
highlighting file.

About the regexps: I just applied your improvements to my regexps basically.
I'm not going to create a full test file, but I did a fair bit of real-time
testing when I designed the regexps (using kregexp and testing in kate).

About the Ddoc highlighting: let me know what you think please, and read my
comments at the top of the file. Also do the comment part of the Ddoc comments
(the opening /++, /// or /**, closing */ or +/, leading + or * and --- code
line separators) come out standard-comment gray or doxygen-comment blue?
They're meant to be gray but I don't know if they initially weren't on my
system because of kate/katepart colour memory or some other reason.

About the ddoc embedded code: I could probably just use the standard D
formatting with a few extra rules to do this. Actually I did; it was much
easier than I'd expected. Version 1.1 includes this, 1.0 doesn't (that's the
only difference). I guess we might as well just use v1.1, unless I've missed
some problems or you think it could be too confusing. Trouble is slightly
changing it such as italicising it all or changing the background colour would
mean re-implementing everything, unless katepart has some extra highlighting
functionality I've overlooked.

Well?
Comment 19 Diggory Hardy 2007-10-09 13:10:23 UTC
Uh, file uploading here's a bit bad for non-text files. That was a .tar.gz archive btw.
Comment 20 Diggory Hardy 2007-10-09 16:27:00 UTC
Bugfix: anywhere within the primary context "normal" (in d.xml) which has context="#pop" must be changed to context="#stay" - this didn't matter before when normal was only used as the root context, but it does now for Ddoc code highlighting.

Simple change so I won't bother uploading a new version.
Comment 21 Dominik Haumann 2007-10-10 08:48:07 UTC
please upload a new version. usually developers don't go through comments to fix stuff of attached files. thanks.
Comment 22 Diggory Hardy 2007-10-10 12:02:11 UTC
Created attachment 21787 [details]
d.xml, ddoc.xml, sample highlighting file (.tgz archive)

This simply includes the above fix, and is a re-upload of ddoc.xml highlighting
file and a sample d code file. If you want the ddoc v1.0 file without embedded
code highlighting get it from my previous attachment.
Comment 23 Dominik Haumann 2007-10-10 19:53:19 UTC
there is d.xml and ddoc.xml. Are both files in a state so that they can be committed? :)
Comment 24 Diggory Hardy 2007-10-11 11:40:36 UTC
Yes, I think so. They haven't had a massive amount of testing, but otherwise yes.
Comment 25 Diggory Hardy 2007-10-11 17:43:10 UTC
Created attachment 21799 [details]
Updated ddoc.xml to 1.1.1

Just a couple of small changes. Mostly just to catch embedded code sections in
ddoc comments which aren't ended at the end of the comment (although this is
probably bad syntax/semantics anyway).

I've given this a little testing and it looks OK, so I reckon it's OK to
commit. There's not a big difference between this and version 1.1.

On a different note, the highlighting of "deprecated" could do with being
improved. What do people reckon?
- Only apply strikeout effect to that word, so everything else is highlighted
properly (after all it's still valid code).
- No longer drop the effect on '(' bracket (this doesn't look good) but
otherwise leave as-is.
- Just apply to the line. Probably much less accurate than above.
I suggest only highlight the keyword, but has anyone got any reasons against
this?
Comment 26 Diggory Hardy 2007-11-05 16:00:50 UTC
Created attachment 22004 [details]
ddoc.xml 1.1.2 - bugfix

Small bugfix to make sure things like /***/ are matched correctly. Tiny change
so should be good to go.
Comment 27 Dominik Haumann 2007-11-11 23:07:54 UTC
Can you have a look at bug #151989 ? Thanks :)
Comment 28 Diggory Hardy 2007-11-14 17:53:38 UTC
OK I replied to that bug. Might help if Dominik could upload the latest d.xml and ddoc.xml files so that people can use them; they seem stable to me. And if you want me to reply to d/ddoc highlighting bugs that's OK but it would be useful to get an email about new bugs.
Comment 29 Dominik Haumann 2007-11-15 11:41:41 UTC
Both files' syntax is not correct, they do not validate:

$ ./checkdtd d.xml
d.xml:337: element context: validity error : Element context does not carry attribute attribute
d.xml:356: element context: validity error : Element context does not carry attribute attribute
d.xml:451: element context: validity error : Element context does not carry attribute attribute
Document d.xml does not validate against language.dtd

$ ./checkdtd ddoc.xml
ddoc.xml:48: element language: validity error : Element language does not carry attribute extensions
ddoc.xml:137: element itemData: validity error : Value "dsOther" for attribute defStyleNum of itemData is not among the enumerated set
Document ddoc.xml does not validate against language.dtd

Also, please use version="x.xx" instead of "x.x.x" in the <language...> tag.
Comment 30 Diggory Hardy 2007-11-15 19:04:21 UTC
Created attachment 22082 [details]
d.xml 1.47 & ddoc.xml 1.13

OK, fixed according to your suggestions.

Found the online kate manual much more up-to-date and useful than what I've got
installed. Also found checkdtd eventually but still not xmllint so no good
there.
Comment 31 Diggory Hardy 2007-12-13 19:24:53 UTC
Created attachment 22532 [details]
d.xml 1.48 BETA & ddoc.xml 1.13

This includes quite a few updates. Checkdtd validated, but there's a lot of
changes and new rules so it's beta status for now. (d.xml 1.47 seems stable
though.)

Things are rather more colourful with this update, but to get all the colour
changes it may be necessary to delete
~/.kde/share/config/katesyntaxhighlightingrc (will also clear all custom
colours).
Comment 32 Diggory Hardy 2008-01-06 16:38:09 UTC
Created attachment 22869 [details]
d.xml 1.49 & ddoc.xml 1.13

Bugfix for d.xml 1.49. Been using for a little while, seems to be stable.
Checkdtd verified.
Comment 33 Diggory Hardy 2008-02-22 13:41:55 UTC
Created attachment 23668 [details]
d.xml 1.51 & ddoc.xml 1.14

Small updates.

checkdtd validated.

Any comments (since no-one else has posted for a while)?
Comment 34 Diggory Hardy 2008-04-09 11:59:55 UTC
I give up posting here, all updates are going here: http://dsource.org/forums/viewtopic.php?p=19226
Comment 35 Anders Lund 2008-04-09 20:19:15 UTC
Is there a reason this file isn't developed/maintained within kde svn?
Comment 36 Diggory Hardy 2008-05-22 13:21:25 UTC
Yes. I don't have an svn account (I did send the admin an email without response).

The files haven't been changed in a long time now, so please just upload the latest version (from the dsource link) and maybe we can be done with it.
Comment 37 Christoph Cullmann 2008-08-13 11:22:02 UTC
Latests ones in /trunk now.
Comment 38 Aziz K. 2008-09-16 02:20:14 UTC
There is a bug in matching identifiers, e.g.: _id123
The digits are not matched as part of the id because it starts with an underscore.
Comment 39 Erlend Hamberg 2008-09-17 08:53:03 UTC
Created attachment 27445 [details]
Specify that variables can start with an underscore

Azis: does the attached patch work?
[Disclaimer: i don't know anything about D]
Comment 40 Diggory Hardy 2008-09-22 12:28:51 UTC
Created attachment 27507 [details]
d.xml 1.61 (ddoc highlighting rolled in)

Rolled a fix into latest version. From 1.60, d.xml contains the ddoc highlighting too so ddoc.xml is no longer needed. Any chance someone can commit it?
Comment 41 Erlend Hamberg 2008-09-23 12:18:10 UTC
Committed. (r863834)
Comment 42 Dominik Haumann 2010-09-10 20:36:39 UTC
Can you all please have a look at bug #250652, there is a new D syntax file.