Bug 188115 - wrong python doc string highlighting
Summary: wrong python doc string highlighting
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: syntax (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR wishlist
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-25 23:31 UTC by Federico Muciaccia
Modified: 2013-05-03 18:28 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.10.0
Sentry Crash Report:


Attachments
docstring highlighting fix (1.70 KB, patch)
2012-10-12 23:53 UTC, Marcel Martin
Details
patch for python.xml version 2.13 (1.95 KB, patch)
2012-12-08 23:36 UTC, Tomasz Narloch
Details
Comparison of the different patches/versions (67.47 KB, image/png)
2012-12-09 14:33 UTC, Philipp A.
Details
second patch for python.xml in version 2.13 (14.82 KB, patch)
2012-12-21 22:24 UTC, Tomasz Narloch
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Federico Muciaccia 2009-03-25 23:31:10 UTC
Version:           3.2.1 (using 4.2.1 (KDE 4.2.1), Arch Linux)
Compiler:          gcc
OS:                Linux (i686) release 2.6.28-ARCH

code example:
---------------------

class Example1():
  """doc string of the Example1 class""" # could also be multiline
  #comment1
  a = """other multiline string"""

class Example2():
  "doc string of the Example2 class"
  #comment2
  a = "other string"

class Example():
  'doc string of the Example3 class'
  #comment3
  a = 'another string'

---------------------

As you can notice, in the first example the doc string is well recognized as a comment and is highlighted of the same colour of the following comment;so the first and the second multiline string of the fist example are correctly highlighted in a different way.
But in the second and third example the highlighter seems not to recognize that the first string of the class is a python doc string and highlights it in the same way of the following string assignement

sorry for my bad english

Federico Muciaccia
Comment 1 Federico Muciaccia 2009-03-25 23:33:18 UTC
the third is 
class Example3():
  ...


little copy-and-paste mistake :-)
Comment 2 Christoph Cullmann 2010-02-20 16:35:37 UTC
Moving to wish list status. All highlightings works with heuristics and will
never be perfect. Perhaps somebody can provide a patch, therefor not closing
it.
Comment 3 Federico Muciaccia 2011-02-19 00:11:57 UTC
Can you point me to the right file?
I can try to solve it
Comment 4 Marcel Martin 2012-10-12 23:53:19 UTC
Created attachment 74512 [details]
docstring highlighting fix

The attached patch fixes the problem. What is the correct way to submit (further) patches? Reviewboard?
Comment 5 Dominik Haumann 2012-10-24 16:46:53 UTC
Git commit 369b7acc0c99afd1ab508dbd0aea348d7396e133 by Dominik Haumann.
Committed on 24/10/2012 at 18:48.
Pushed by dhaumann into branch 'master'.

attempt to fix pyhton doc scrings. please test.

M  +6    -3    part/syntax/data/python.xml

http://commits.kde.org/kate/369b7acc0c99afd1ab508dbd0aea348d7396e133
Comment 6 Dominik Haumann 2012-10-24 16:48:31 UTC
@Marcel: Yes, either reviewboard, the mailing list, or as attachement to bug reports. Reviewboard is best, though.
Comment 7 Philipp A. 2012-11-10 17:25:06 UTC
umm, i don’t know if this change has introduced it, but suddenly python treats far too many strings as docstrings, i.e. all strings that occur at the beginning of a line with nothing but whitespace to the left of them.

this kind of syntax is extremely common in literal lists or dicts:

mydict = {
    'a': 'b',
    'c': 'd',
}

both 'a' and 'c' are now wrongly identified as comments /docstrings.

----------

another problem: the regex “u?r?” isn’t enough for a string-start: they can also be switched.

i’d propose “(u?r|r?u)?”. this matches all valid string-modifiers: u, ur, r, ru.

----------

PS: should i open one or two new bug reports or, alternatively: how to do a pull request for KDE? i’m positive i can fix both issues and more (missing python3 buitins like bytes and ascii, new format string syntax highlighting)
Comment 8 Philipp A. 2012-11-24 17:48:52 UTC
https://git.reviewboard.kde.org/r/107450/

hi, i tried to fix it, but there is one last bug preventing it from detecting the initial docstring.

but the rest is far better, and we don’t get docstrings detected everywhere anymore.

citation me from reviewboard:

> docstrings are now only detected in a certain context,
> but as soon as i try to put that on top, everything breaks.
> 
> so we either need a special initial context that detects a shebang and docstring,
> or someone has to explain to me why putting the docstring-detecting one
> on top and hoping for it to fall through to “Normal” doesn’t work.
Comment 9 Philipp A. 2012-12-08 15:53:47 UTC
guys, could you please help me with the highlighting?

my version of the highlighting is clearly better than the one in the repos right now, as it only detects the initial doctstring wrongly instead of e.g. every key in a multiline dict. and i write more multiline dicts than initial docstrings, which means that the highlighting version in the repos right now is unacceptable.

so we should either accept my regression for the time being or come up with a solution fast.
Comment 10 Tomasz Narloch 2012-12-08 23:36:12 UTC
Created attachment 75734 [details]
patch for python.xml version 2.13

Fix a bug for single "/' comment.

a = {
  'this was hightlighted as comment': 'this was correct',
  'after patch this is correct': 'this is correct',
}

and for:
> - means folding mark

Before:
----------
  if True:
>  # some comment with wird folding mark that should be line above 
   pass
----------
After:
----------
> if True:
  # some comment without folding mark 
   pass
----------

Best Regards,
Comment 11 Philipp A. 2012-12-09 14:33:56 UTC
Created attachment 75750 [details]
Comparison of the different patches/versions

sorry, Thomasz, but your patch introduces other regressions, because it basically reverts 2.13 to 2.12.

here is a comparison where you can see why my version is the best out of all available ones: a mockup of a perfect highlighing is on the bottom right, but there’s no code achieving that one yet.

my version is the only one that has guaranteed maximally only one wrongly highlighted string, while all others have several (unless you don’t use dicts or docstrings :D)
Comment 12 Tomasz Narloch 2012-12-20 15:13:37 UTC
Hi flying-sheep@web.de, 
I've checked your comparision  and have some suggestion, opinions:
1) Please change that lines:

-			<emptyLine regexpr="(?:\s+|\s*#.*)"/>
+			<emptyLine regexpr="(?:\s+|\s*)"/>

Explanation:
Every # comment should be intended.
http://www.python.org/dev/peps/pep-0008/#block-comments

2) About single " or ' comments its rarely used, especially as a docstring.
I've thought that docstring is always in 3 quotes """docstring""".

I think that syntax should support more  standards and less rarely used syntax if that is a problem with ideal solution. 
Explanation: 
I do not use single ' or " comment but use # comment.

3) Your comparision and example of dictionary key:

a = {
    """key""": "value",
}

This is rarely used syntax: """key""". Multiline key is weird for me.

Best Regards
Comment 13 Philipp A. 2012-12-20 17:21:55 UTC
1) “no indentation” is also an indentation level. the cited doc just says that each comment should be indented the same as surrounding code; if the surrounding code isn’t indented, why indent the comment? example:

this("is undindented code")
#this is a comment about the following line of unindented code
this("is another line of unindented code")

why should the comment in above example be indented?

2) just because you don’t use "single quoted docstrings" it doesn’t mean that they are not out in the wild. and just because """triple quoted strings""" are mostly docstrings, that doesn’t mean all are. in fact, triple quoted strings are commonly used for multiline strings.

single-quoted docstrings should be highlighted as docstrings if python interprets them so. and multiline strings shouldn’t be highlighted as docstrings if they are simple strings. it’s simply the right thing to do.

3) it’s still correct. highlighting should help the developer, not confuse him. if the code is confusing, doubly so. so you should be glad that my code correctly identifies the above as normal string (which it is).

———————————————

please, thomasz: if you 1) have no solution to the sole problem with my highlighting code, and 2) don’t have a  more correct solution than my code, don’t argue. my code is still the most correct solutions, and makes at most 1 error per file, while all other solutions make several.
Comment 14 Tomasz Narloch 2012-12-21 22:24:27 UTC
Created attachment 75965 [details]
second patch for python.xml in version 2.13

Hi, I think I found an idea solution:)

1) this was misunderstanding. My fault.
I had in my mind: “intended” as  "indented the same as surrounding code."
2) You have right, but my last patch is  better solution:) Take a try
3) You have right:)

About patch:

1) There is no string like: 
    ru"""Not validated by python 2.7 at least""", 
only
    ur"""validated"""
    u"""validated"""
    r"""validated"""

2) weird by validated by python and my solution:

u"""Start...""" \
\
\
u"""...end"""

3)  <RegExpr attribute="Normal" String="[a-zA-Z_][a-zA-Z_0-9]{2,}" context="#stay"/>
I had to change that line for better highlight for
ur"""comment"""
I do not see any bad effect about that.

4)  I found only 1 bad hightlight for following lines:
# code
"sdfsdf".upper()
# code

that probably should be a string not a comment.

5) Can I ask you to check my solution.

Best Regards
Comment 15 Philipp A. 2012-12-23 23:37:17 UTC
i’ll check it as soon as possible, but i can’t right now. looks promising at first glance!
Comment 16 Philipp A. 2013-01-04 14:52:22 UTC
you’re right, your patch is awesome!

ship it, people.
Comment 17 Tomasz Narloch 2013-01-25 09:16:03 UTC
I haven't seen any negative opinions.
Could someone add that patch to repo. 
Maybe some sign "Ship it" on https://git.reviewboard.kde.org/r/107450/
Comment 18 Christoph Cullmann 2013-02-25 19:26:38 UTC
Git commit b7e146284f724a50591bba7d69192cc33704cdc7 by Christoph Cullmann, on behalf of Phil Schaf.
Committed on 25/02/2013 at 20:25.
Pushed by cullmann into branch 'master'.

fix problem with docstrings
review request "thomasz fixed all docstrings"
REVIEW: 107450

M  +95   -39   part/syntax/data/python.xml

http://commits.kde.org/kate/b7e146284f724a50591bba7d69192cc33704cdc7
Comment 19 Matthew Woehlke 2013-03-24 22:56:41 UTC
This appears to have introduced a regression. Python scripts like:

my_global = {
  'string',
  ...
}

...used to be highlighted as strings. They are now highlighted as comments. I am admittedly not a Python expert, but this doesn't seem correct...?
Comment 20 Philipp A. 2013-03-25 13:17:55 UTC
that was the state *after* the commit from comment #5 and *before* the commit from comment #19.

the latter specifically was done in order to fix said regression. as it wasn’t reverted afterwards[1], this is fixed.

[1]: https://projects.kde.org/projects/kde/applications/kate/repository/diff/part/syntax/data/python.xml?rev=0f5fa0026a9474bf732a845596e10fa1c7bf7753&rev_to=b7e146284f724a50591bba7d69192cc33704cdc7
Comment 21 Matthew Woehlke 2013-03-25 16:03:43 UTC
I will try to remember to check with the python.xml that is currently in master, but I was getting what I believe to be correct behavior until I updated to 4.10.1.
Comment 22 Philipp A. 2013-03-27 01:07:23 UTC
the newest version certainly highlights your sample code correctly. what version is your python highlighting file? the newest one is 2.16.

You can see the installed version (and upgrade) using

Settings → Configure Kate → Open/save → Modes & filetypes → Download highlighting files
Comment 23 Matthew Woehlke 2013-03-27 15:47:42 UTC
(In reply to comment #20)
> that was the state *after* the commit from comment #5 and *before* the
> commit from comment #19.

(...comment #18, of course :-).)

Confirming this seems to be the case. I'm going to blame the confusion on the broken version apparently not hitting Fedora until mid-March 2013, a few weeks *after* it was fixed upstream and quite some months since the broken version has been in git.
Comment 24 annalee 2013-05-03 17:59:56 UTC
I'm on 3.10.2 (using KDE development platform 4.10.2) and it's still having the same problem (highlighting strings at the beginning of a line as if they're comments). Has the fix not been pushed to the relevant repos for kubuntu 13.04 yet?
Comment 25 annalee 2013-05-03 18:28:01 UTC
I spoke too soon. Updating to the latest version of python.xml fixed it.