Bug 441603 - Regression in Ruby indentation with regular expressions
Summary: Regression in Ruby indentation with regular expressions
Status: CONFIRMED
Alias: None
Product: kate
Classification: Applications
Component: indentation (show other bugs)
Version: 20.12.2
Platform: Debian stable Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-27 12:10 UTC by jaroslav
Modified: 2021-10-04 22:01 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Hacky ruby.js diff which fixes the issue (638 bytes, patch)
2021-09-02 10:15 UTC, Jan Paul Batrina
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jaroslav 2021-08-27 12:10:43 UTC
After upgrading from Debian Buster to Debian Bullseye Ruby indentation stopped working properly when there are regular expressions involved in the source


STEPS TO REPRODUCE
1. open new document, switch its indentation to Ruby
2. type

if v =~ /re/<enter>

OBSERVED RESULT
cursor indents two tabs to the right on the next line. After that the indentation becomes "confused" and doesn't work properly. The indentation works fine for case-insensitive regexp though. (Additional observations below.)

EXPECTED RESULT
cursor indents one tab to the right (being in the condition block). 


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Debian Bullseye
KDE Plasma Version: not being used (running with Xfce)
KDE Frameworks Version: 5.78.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION

(1) The bug is present in version 21.07.90 as found in Kubuntu live image (http://cdimage.ubuntu.com/kubuntu/daily-live/current/)

(2) The editor works correctly when the regexp is case-insensitive (or has any other modifier)

if v =~ /re/i<enter>

- cursor correctly indents one tab to the right. All subsequent lines are indented as well until you type "end", which is immediatelly and automatically de-indented as the end of the "if" block:

if v =~ /re/i
    statement
end

However, when the regexp has no modifier, things go like this:

if v =~ /re/<enter>

- cursor indents two tabs to the right, my guess would be the indenter doesn't consider that regexp to be complete (expecting at least one more letter). Next line is de-indented to a single tab indentation, which is correct, but indentation in most of the code after such block does not work correctly.

Final code looks like this:

if v =~ /re/
        statement
    statement
end

(Should have both statements on the same level, that level being one tab, not two)

The following is more elaborate example of indentation errors caused by this regression. All indentation shown is done by the editor, nothing is adjusted manually:

statement =~ /re/
        <two tabs, cursor is here, no text, just newline>
        begin
if v =~ /re/
        statement
    statement
end
rescue
    statement
    end
    <cursor stays here>

The following is the correct behaviour observed when the regexp is case-insensitive (again, no manual indentation adjustment):

statement =~ /re/i

begin
    if v =~ /re/i
        statement
        statement
    end
rescue
    statement
end

Any help or advice would be appreciated. I have basic understanding of C/C++ code so should be able to apply patches to Debian's version of Kate.
Comment 1 Jan Paul Batrina 2021-09-02 10:15:47 UTC
Created attachment 141241 [details]
Hacky ruby.js diff which fixes the issue

Hello, thanks for the bug report + analysis!

The rules for indentation are written in javascript. Particularly, the behavior you describe is due to https://invent.kde.org/frameworks/ktexteditor/-/blob/master/src/script/data/indentation/ruby.js#L335 since the closing / in the regex is being treated as an incomplete division operator (see https://invent.kde.org/frameworks/ktexteditor/-/blob/master/src/script/data/indentation/ruby.js#L113).

git blame shows that both lines have been unchanged since the beginning of the git repo. Do you remember the specific version where it previously worked? Maybe there were Debian-specific patches for this?

The attached diff fixes the issue, but it relies on the syntax highlighting enging, which is a very hacky way especially if we need to consider other filetypes that use the ruby indentation style. If you find a cleaner way, feel free to submit an MR to https://invent.kde.org/frameworks/ktexteditor/-/merge_requests !
Comment 2 jaroslav 2021-09-03 10:15:37 UTC
Hello,

thanks for your time looking into this. It took some digging - there is no ruby.js file anywhere in /usr on my system anymore, guess that's something that changed with new Debian - but eventually I managed to find out that the file is contained in ktexteditor Debian source package and patch it there. Everything compiled well, I installed libkf5texteditor5 library the compilation produced and can now confirm that the patch fixes the problem.

I tried to look for any Debian-specific patches but found none. As far as I can see, kate package has no patches at all and ktexteditor has a single patch which seems to replace some underscore function definitions with a require of underscore.js library. This patch was present in previous version of ktexteditor shipped by Debian.

I also found no change in the ruby.js file between previous (5.54) and current (5.78) Debian version, the only exception being removal of license text at the beginning of the file (replaced by "SPDX-License-Identifier: LGPL-2.0-only").

The indentation did work in previous Debian version so I guess something else must've changed. Since you mentioned that the patch relies on it, syntax highlighting did change between the versions. The change is pretty minimal though as far as I can see - regular expressions used to be in green text and are orange now - so doesn't seem like anything able to cause this regression.

As for non-hacky way to fix this, I am fresh out of ideas. One solution that comes to mind would be to split the check into two regular expressions with the first one being the current without the division operator and the second being the division operator not preceded by another division operator. Which obviously runs afoul on code like

if 1 == i / 5 and 2 == j /
if s == "I/O" or 3 == i /

That moves us into "second slash after =~ or !~ operator, but don't count slashes escaped by backslash", which feels like something well beyond my regexp-fu.
Comment 3 Jan Paul Batrina 2021-09-05 02:35:51 UTC
Just tested with a Linux Mint 20.2 (based on Ubuntu 20.04) live usb I have lying around, and there indeed the indentation worked properly with Kate 19.12 and KTextEditor 5.68.

In fact, the behavior there seems to be "smart" as it does not only take into account the last /. For example, `if v =~ /re/` will only have one indent, but `if  v =~ re/` and `if v =~ /re//` will both have two indentations as the last / is correctly identified as an unfinished division operator.

Since linuxmint is still somewhat a derivative of debian, it's still possible that this correct behavior is limited to debian. I'll try to compile the same versions later to see if it works there too, then maybe some other patch (not directly to ruby.js) caused the behavior change.