Bug 62702

Summary:
Product:	[Applications] kate	Reporter:	Joseph Nievelt <jjnievel>
Component:	syntax	Assignee:	KWrite Developers <kwrite-bugs-null>
Status:	RESOLVED FIXED
Severity:	normal
Priority:	NOR
Version:	unspecified
Target Milestone:	---
Platform:	unspecified
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	new matlab.xml file

Description Joseph Nievelt 2003-08-15 04:08:50 UTC

Version:           3.0.0a5 (using KDE KDE 3.1.3)
Compiler:          gcc 3.2.3 
OS:          Linux

Matlab (*.m files) use asterisk (') for both string delimiting and also for the matrix/vector complex conjugate transposition, but gideon only treats it as a string delimiter.

You may want to check matlab docs on this, but I think that ' only begins a string literal when there is not an identifier/expression before it.  Also note that .' (period asterisk) is non-conjugate transposition, so that needs to be handled the same way.

Comment 1 Jens Dagerbo 2003-08-15 05:36:39 UTC

Syntax highlighting is handled by the loaded editor part. In this case you are 
almost certainly referring to katepart. Reassigning.

Comment 2 Jesse 2003-08-15 15:52:33 UTC

Ok. You most likely are using an old version of the highlighting file as I 
believe this has been taken care of in CVS.  Also the (') is an apostrophe and 
the (*) is an asterisk.  Which did you mean?

Could you possibly try the highlighting file found at 
http://webcvs.kde.org/cgi-bin/cvsweb.cgi/kdelibs/kate/data/matlab.xml?
rev=1.10&content-type=text/x-cvsweb-markup

After you've tried this do you have an example of where the highlighting fails 
so I can easily fix it?

Comment 3 Joseph Nievelt 2003-08-16 18:52:04 UTC

Oh oops sorry about the asterisk/apostrophe :P.  In any case it looks like the
new matlab.xml still fails (at least) on a literal matrix:

[1 2; 3 4]'

and also it looks like the non-conjugate transpose .' is still broken.

Comment 4 Joseph Nievelt 2003-08-16 20:10:08 UTC

Changing the line

<RegExpr String="\b([\w]+[\d\w]*)?[\s]*([([][]-,:[+*/\d\s\w]*[])])?[']"
attribute="Transpose" context="#stay"/>

to

<RegExpr String="(\b([\w]+[\d\w]*)|[\s]*([([][]-,:[+*/\d\s\w]*[])]))[.]?[']"
attribute="Transpose" context="#stay"/>

Ie, 'A or B' instead of 'maybe A, then maybe B', seems to work.  However, the
numbers in the expression lost their coloring now.

Comment 5 Jesse 2003-08-16 20:42:59 UTC

Can you try this beast: 
 
<RegExpr String="(([([][]:+*/[-,\d\s\w]*[])])+|[.]|\b([\w]+[\d\w]*)?[\s]*([([][]:+*/
[-,\d\s\w]*[])])?)[']" attribute="Transpose" context="#stay"/>

Comment 6 Joseph Nievelt 2003-08-17 00:39:23 UTC

This gives similar behavior, but I have a few questions about it:

1. What is the \b for?  "man re_syntax" says it's a backspace, but I don't
understand what the function is.

2. Why the '+' on the first choice?  Don't we want to only match the term that
is actually transposed (ie the last one)?

3. Why is [.] a choice instead of optional at the end?  This will prevent the re
from matching the actual expression that is transposed.

4. Why [\s]* outside of the parentheses?  Actually, why is the [\s]*(...)? term
in the third choice at all?  Shouldn't that case be handled by the first choice?
 I think this is bad for the same reason as #1.

Unless the tranposed term is highlighted or something (unlikely), these are just
a matter of style, but it seems simpler to do them than not.

I did notice that a lot of operators/characters that could legitimately appear
inside parentheses weren't handled.  Also, expressions may be transposed
multiple times.  Right now I have:

<RegExpr
String="(([\{([][]:;+*/[,\.{}()~&lt;&gt;^=&amp;\d\s\w-]*[])\}])|\b[\w]+[\d\w]*)([\.]?['])+"
attribute="Transpose" context="#stay"/>

But eep I just found something else.  A numerical literal can also be
transposed, even if it doesn't make much sense to do so.  I can't figure out how
to get the re to match.  I've tried adding "|[\d]+" but they keep coming out red.

By the way, it looks like everything in these expressions lose their original
coloring, ie:

foo('bar')'
[1 2]'

Comment 7 Joseph Nievelt 2003-08-17 00:40:35 UTC

Ugh, sorry forgot to update that re right before I sent that one :|.  I did put
an apostrophe in there:

<RegExpr
String="(([\{([][]:;+*/[,\'\.{}()~&lt;&gt;^=&amp;\d\s\w-]*[])\}])|\b[\w]+[\d\w]*)([\.]?['])+"
attribute="Transpose" context="#stay"/>

Comment 8 Jesse 2003-08-17 01:00:45 UTC

The \b stands for word boundary.  If you havent discovered the awesome tool 
KRegExpEditor  in kdeutils I strongly suggest you check it out as it will give you a 
visualization of the RegExpr and a GUI for editing it.  Takes a while to get used to 
but it also has a text area where the regexpr is applied so you can see what it 
matches. 
 
The [.] is an alternative because there's now 3 (!!) sepearte rules I've combined into 
one to try to take care of the mess.  Gona hurt the person who decided it was a good 
idea to make ' both string and transpose delimiter ... Also, as is sometimes the case 
with the languages, there's not much I can do about just plain wrong syntax.  It's 
hard because I don't have matlab so I don't know the 1001 different ways a matrix 
can be defined.   For your foo('bar')' problem, is that valid matlab syntax and I'm 
curious to know the result :) 
 
I'm gona have to sit down with this problem for a while me thinks.  It handles 90% 
of my test cases from matlab files on the web but those last 10% make matlab 
unusable because everything after ' appears as a sting :(.  Gimme a while for this but 
thank you very, very much for the investigating you've done too.  That's a rare 
occurance :)

Comment 9 Joseph Nievelt 2003-08-17 03:24:51 UTC

Mathworks has a great reference site, with docs on pretty much everything matlab:

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/ref.shtml

Most notably, check out the "Programming and Data Types->Operators and
Operations".  You can also check out the open source project octave:

http://www.octave.org/

Which does a pretty good job of mimicing matlab.  Of course it's not 100%
guaranteed, but the syntax/semantics should be the same.

In any case, the semantics of foo('bar')' is actually pretty simple.  In this
case, foo could be a function that takes a single string argument and returns a
matrix.  There are also a lot of operations that matlab does with entire
matrices at a time.  For example, A > B does element-wise comparison, and you
end up with a matrix of 1's and 0's.

I hope this helps, and I understand that there's not always something to be
done.  If this is as much as the current highlighting system allows, this might
as well be resolved.  Oh and thanks for the info on KRegExpEditor info.

Comment 10 Anders Lund 2003-08-17 09:55:58 UTC

Subject: Re:  asterisks in matlab files are not always highlighted properly

On Sunday 17 August 2003 01:00, Jesse Yurkovich wrote:

Comment 11 Jesse 2003-08-17 17:16:50 UTC

Yes, that's my current approach right now with a separate context.  Hopefully by later 
today I'll have something that's sane although I still probably won't be able to get a 
secenerio like for('bar')' + H' correct.

Comment 12 Joseph Nievelt 2003-08-18 19:12:28 UTC

Hey this is just a heads up.  I realize this probably won't reach perfection
with the current system, but I thought I'd let you know about some interesting
new string semantics I discovered recently:

Apparently a string 'abcd' can be interpreted as a row vector of numbers
(corresponding to the ASCII codes for each character).  Thus, the following are
equivalent:

'abc'
['a' 'b' 'c']

The following are also equivalent:

'abc'.'
['a';'b';'c']

ie a column of letters.  However, 'abc'' is invalid syntax because in true BASIC
style, two apostrophes are used inside a string literal to indicate a literal
apostrophe (like \' inside C strings).  On the other hand, ('abc')' is valid,
and behaves just like 'abc'.' (ASCII codes are real so conjugate transpose
reduces to the regular transpose).

I don't think anyone will cry if these aren't highlighted properly, but it's
maybe something to test just for the sake of testing, and something you'd want
to put on a 'known not to work' list if applicable.

Comment 13 Jesse 2003-09-04 22:48:02 UTC

Created attachment 2372 [details]
new matlab.xml file

This one is a bit better.  It's been tested on the following chunk of code. 
The transpose operations and strings are no longer confused. This seems a lot
better than before.

code test:
semilogx(logspace(-6,-1),br(1:5,:,1+3)');
foo('bar')' + H'
[1 2]'something

disp(['p = ' int2str(p) '; q = [' int2str(q(1)) ',' int2str(q(2)) ']; UC = ' 
num2str(UC)]);
K = Pminus * H' / (H * Pminus * H' + R);
K = Pminus * H' / (H * Pminus * H' + R);
b[1 2; 3 4]' - H' + brrrr[2]' + [3 4; 5 6] * [5; 7]' + foo('bar')'
b[1 2; 3 4]' - H' + brrrr[2]' + [3 4; 5 6]' * [5; 7] + foo('bar')'
b[1 2; 3 4]' - H' + brrrr[2]' + foo('foobar')' + H'
b[1 2; 3 4]' - H' + brrrr[2]' + foo('foo + bar - 2')' + H'

b .' 'someting else'

Comment 14 Jesse 2003-09-07 00:03:57 UTC

closing. tested with a few other cases and things seem ok.  If any cases are left we 
might have to let them be highlighted slightly wrong.  The string rule is only per line 
anyhow so at most only the rest of a line would be wrong.

Comment 15 Jesse 2003-09-07 02:47:20 UTC

I've been had by broken konq ...