Bug 504749 - "<" counts as start of tag, so colours following text, regardless of the text that follows
Summary: "<" counts as start of tag, so colours following text, regardless of the text...
Status: RESOLVED FIXED
Alias: None
Product: lokalize
Classification: Applications
Component: editor (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Simon Depiets
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-05-24 22:52 UTC by Finley Watson
Modified: 2025-07-09 13:57 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Lokalize editor window showing translation source and target, with a <= incorrectly marked as the start of a tag, and all text following that coloured as a tag (114.78 KB, image/png)
2025-05-24 22:52 UTC, Finley Watson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Finley Watson 2025-05-24 22:52:27 UTC
Created attachment 181718 [details]
Lokalize editor window showing translation source and target, with a <= incorrectly marked as the start of a tag, and all text following that coloured as a tag

I realise this is not an easy task to fix, but we have (html, etc) tag colouring in the editor panes in Lokalize. The "<" character counts as the beginning of a tag.

However it is also a valid char in normal text you might translate (example situation included here). In these cases text can be coloured as a tag without it being a real or valid tag.

One way to fix this being more clever with the string parsing: after < there (very probably) should only ever be a char matching [a-zA-Z] so for example < em > is not a valid html tag while <em> is. We also could look at giving up on marking a section of the string as a tag if we have a < and > mismatch i.e. walk the string looking for the matching > tag, if it's not found by the end of the string, or another < is found, reject that first < and don't colour it as tag.
Comment 1 Bug Janitor Service 2025-07-02 22:47:37 UTC
A possibly relevant merge request was started @ https://invent.kde.org/sdk/lokalize/-/merge_requests/246
Comment 2 Finley Watson 2025-07-09 10:23:07 UTC
Git commit ee40ecd67e3ba95ab75aac433a81663f958e2d6f by Finley Watson.
Committed on 09/07/2025 at 10:23.
Pushed by finw into branch 'master'.

Improve HTML tag matching to reduce the false-positive colouring

By searching for tags with regex, we can reduce the number of false-
positive matches where text in the translation source / target is
coloured as though it were a HTML tag when it isn't. Previously, any
text with a `<` char in it would be coloured as HTML from that char,
now only from the `<` char followed by an alphabetic char,
potentially with a `/` char between i.e. matches `<s` in `<strong>`
and `</s` in `</strong>` but not something like `< b` or `<= 3`. In
my experience this is in line with how web engines parse HTML files.

Before:

![image](/uploads/5b7909579002204a6a94f3e798928831/image.png){width=675 height=590}

After:

![image](/uploads/ca5dfdde0decc98603231d41807439a7/image.png){width=681 height=557}

Original HTML tag highlighting is not changed:

![image](/uploads/910835acca0f003710547f87d6cc8585/image.png){width=232 height=92}

M  +5    -2    src/syntaxhighlighter.cpp

https://invent.kde.org/sdk/lokalize/-/commit/ee40ecd67e3ba95ab75aac433a81663f958e2d6f
Comment 3 Finley Watson 2025-07-09 13:57:32 UTC
Git commit c05bb1d914725d894c4a60c5c159375f1e6d94b2 by Finley Watson.
Committed on 09/07/2025 at 10:23.
Pushed by finw into branch 'release/25.08'.

Improve HTML tag matching to reduce the false-positive colouring

By searching for tags with regex, we can reduce the number of false-
positive matches where text in the translation source / target is
coloured as though it were a HTML tag when it isn't. Previously, any
text with a `<` char in it would be coloured as HTML from that char,
now only from the `<` char followed by an alphabetic char,
potentially with a `/` char between i.e. matches `<s` in `<strong>`
and `</s` in `</strong>` but not something like `< b` or `<= 3`. In
my experience this is in line with how web engines parse HTML files.

Before:

![image](/uploads/5b7909579002204a6a94f3e798928831/image.png){width=675 height=590}

After:

![image](/uploads/ca5dfdde0decc98603231d41807439a7/image.png){width=681 height=557}

Original HTML tag highlighting is not changed:

![image](/uploads/910835acca0f003710547f87d6cc8585/image.png){width=232 height=92}
(cherry picked from commit ee40ecd67e3ba95ab75aac433a81663f958e2d6f)

M  +5    -2    src/syntaxhighlighter.cpp

https://invent.kde.org/sdk/lokalize/-/commit/c05bb1d914725d894c4a60c5c159375f1e6d94b2