Created attachment 181718 [details] Lokalize editor window showing translation source and target, with a <= incorrectly marked as the start of a tag, and all text following that coloured as a tag I realise this is not an easy task to fix, but we have (html, etc) tag colouring in the editor panes in Lokalize. The "<" character counts as the beginning of a tag. However it is also a valid char in normal text you might translate (example situation included here). In these cases text can be coloured as a tag without it being a real or valid tag. One way to fix this being more clever with the string parsing: after < there (very probably) should only ever be a char matching [a-zA-Z] so for example < em > is not a valid html tag while <em> is. We also could look at giving up on marking a section of the string as a tag if we have a < and > mismatch i.e. walk the string looking for the matching > tag, if it's not found by the end of the string, or another < is found, reject that first < and don't colour it as tag.
A possibly relevant merge request was started @ https://invent.kde.org/sdk/lokalize/-/merge_requests/246
Git commit ee40ecd67e3ba95ab75aac433a81663f958e2d6f by Finley Watson. Committed on 09/07/2025 at 10:23. Pushed by finw into branch 'master'. Improve HTML tag matching to reduce the false-positive colouring By searching for tags with regex, we can reduce the number of false- positive matches where text in the translation source / target is coloured as though it were a HTML tag when it isn't. Previously, any text with a `<` char in it would be coloured as HTML from that char, now only from the `<` char followed by an alphabetic char, potentially with a `/` char between i.e. matches `<s` in `<strong>` and `</s` in `</strong>` but not something like `< b` or `<= 3`. In my experience this is in line with how web engines parse HTML files. Before: {width=675 height=590} After: {width=681 height=557} Original HTML tag highlighting is not changed: {width=232 height=92} M +5 -2 src/syntaxhighlighter.cpp https://invent.kde.org/sdk/lokalize/-/commit/ee40ecd67e3ba95ab75aac433a81663f958e2d6f
Git commit c05bb1d914725d894c4a60c5c159375f1e6d94b2 by Finley Watson. Committed on 09/07/2025 at 10:23. Pushed by finw into branch 'release/25.08'. Improve HTML tag matching to reduce the false-positive colouring By searching for tags with regex, we can reduce the number of false- positive matches where text in the translation source / target is coloured as though it were a HTML tag when it isn't. Previously, any text with a `<` char in it would be coloured as HTML from that char, now only from the `<` char followed by an alphabetic char, potentially with a `/` char between i.e. matches `<s` in `<strong>` and `</s` in `</strong>` but not something like `< b` or `<= 3`. In my experience this is in line with how web engines parse HTML files. Before: {width=675 height=590} After: {width=681 height=557} Original HTML tag highlighting is not changed: {width=232 height=92} (cherry picked from commit ee40ecd67e3ba95ab75aac433a81663f958e2d6f) M +5 -2 src/syntaxhighlighter.cpp https://invent.kde.org/sdk/lokalize/-/commit/c05bb1d914725d894c4a60c5c159375f1e6d94b2