| Summary: | "<" counts as start of tag, so colours following text, regardless of the text that follows | ||
|---|---|---|---|
| Product: | [Applications] lokalize | Reporter: | Finley Watson <fin-w> |
| Component: | editor | Assignee: | Simon Depiets <sdepiets> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | aacid, shafff |
| Priority: | NOR | ||
| Version First Reported In: | unspecified | ||
| Target Milestone: | --- | ||
| Platform: | Other | ||
| OS: | Linux | ||
| Latest Commit: | https://invent.kde.org/sdk/lokalize/-/commit/c05bb1d914725d894c4a60c5c159375f1e6d94b2 | Version Fixed/Implemented In: | |
| Sentry Crash Report: | |||
| Attachments: | Lokalize editor window showing translation source and target, with a <= incorrectly marked as the start of a tag, and all text following that coloured as a tag | ||
A possibly relevant merge request was started @ https://invent.kde.org/sdk/lokalize/-/merge_requests/246 Git commit ee40ecd67e3ba95ab75aac433a81663f958e2d6f by Finley Watson.
Committed on 09/07/2025 at 10:23.
Pushed by finw into branch 'master'.
Improve HTML tag matching to reduce the false-positive colouring
By searching for tags with regex, we can reduce the number of false-
positive matches where text in the translation source / target is
coloured as though it were a HTML tag when it isn't. Previously, any
text with a `<` char in it would be coloured as HTML from that char,
now only from the `<` char followed by an alphabetic char,
potentially with a `/` char between i.e. matches `<s` in `<strong>`
and `</s` in `</strong>` but not something like `< b` or `<= 3`. In
my experience this is in line with how web engines parse HTML files.
Before:
{width=675 height=590}
After:
{width=681 height=557}
Original HTML tag highlighting is not changed:
{width=232 height=92}
M +5 -2 src/syntaxhighlighter.cpp
https://invent.kde.org/sdk/lokalize/-/commit/ee40ecd67e3ba95ab75aac433a81663f958e2d6f
Git commit c05bb1d914725d894c4a60c5c159375f1e6d94b2 by Finley Watson.
Committed on 09/07/2025 at 10:23.
Pushed by finw into branch 'release/25.08'.
Improve HTML tag matching to reduce the false-positive colouring
By searching for tags with regex, we can reduce the number of false-
positive matches where text in the translation source / target is
coloured as though it were a HTML tag when it isn't. Previously, any
text with a `<` char in it would be coloured as HTML from that char,
now only from the `<` char followed by an alphabetic char,
potentially with a `/` char between i.e. matches `<s` in `<strong>`
and `</s` in `</strong>` but not something like `< b` or `<= 3`. In
my experience this is in line with how web engines parse HTML files.
Before:
{width=675 height=590}
After:
{width=681 height=557}
Original HTML tag highlighting is not changed:
{width=232 height=92}
(cherry picked from commit ee40ecd67e3ba95ab75aac433a81663f958e2d6f)
M +5 -2 src/syntaxhighlighter.cpp
https://invent.kde.org/sdk/lokalize/-/commit/c05bb1d914725d894c4a60c5c159375f1e6d94b2
|
Created attachment 181718 [details] Lokalize editor window showing translation source and target, with a <= incorrectly marked as the start of a tag, and all text following that coloured as a tag I realise this is not an easy task to fix, but we have (html, etc) tag colouring in the editor panes in Lokalize. The "<" character counts as the beginning of a tag. However it is also a valid char in normal text you might translate (example situation included here). In these cases text can be coloured as a tag without it being a real or valid tag. One way to fix this being more clever with the string parsing: after < there (very probably) should only ever be a char matching [a-zA-Z] so for example < em > is not a valid html tag while <em> is. We also could look at giving up on marking a section of the string as a tag if we have a < and > mismatch i.e. walk the string looking for the matching > tag, if it's not found by the end of the string, or another < is found, reject that first < and don't colour it as tag.