395259 – Non-ascii text shifts identifier locations

Bug 395259 - Non-ascii text shifts identifier locations

Summary: Non-ascii text shifts identifier locations

Status:	CONFIRMED

Alias:	None

Product:	kdev-python
Classification:	Developer tools
Component:	Language support (show other bugs)
Version:	5.2.3
Platform:	Other Linux

Importance:	NOR minor
Target Milestone:	---
Assignee:	Sven Brauch

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-06-11 23:45 UTC by Nicolás Alvarez
Modified:	2018-08-22 15:33 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Nicolás Alvarez 2018-06-11 23:45:02 UTC

If a line contains non-ASCII characters (eg. in a string literal), declarations and uses later in that line get shifted from their correct locations.

Example:

code = 502
print("Código: %d" % code, file=sys.stderr)

"ode," is highlighted as a use of the "code" variable. "ys.std" is highlighted as a use of "stderr". "ys." is highlighted as a use of "sys", which overlaps with the "stderr" range so you can't actually hover it.

Deleting the letter "ó" and waiting a second for the auto-reparse makes the ranges become correct.

Comment 1 Francis Herne 2018-08-22 15:33:06 UTC

I spent a little while looking at this.

The cause is that the CPython parser (used by KDevelop) returns all offsets in UTF-8 bytes, while the KTextEditor API uses actual characters.

Anything represented using >1 byte in UTF-8 thus causes the offset.

The only way I see to fix this would be to scan for multi-byte characters and do yet another set of range fixups, which would be quite expensive while benefitting very few scenarios.

(we can't remove such characters before feeding the parser, because they can appear in docstrings or even identifiers)

The other alternative would be to have our own parser (again); that's clearly not worthwhile for this alone, but there's already a lot of ugly code to workaround various limitations/lossiness and statements by the CPython devs (e.g. https://bugs.python.org/issue32911#msg313698) suggest it's only likely to get worse.