Bug 274430 - KDevelop syntax highlighting wrong on lines containing unicode characters
Summary: KDevelop syntax highlighting wrong on lines containing unicode characters
Status: CONFIRMED
Alias: None
Product: kdevelop
Classification: Applications
Component: Language Support: CPP (Clang-based) (show other bugs)
Version: 5.4.4
Platform: unspecified Linux
: VHI normal
Target Milestone: ---
Assignee: kdevelop-bugs-null
URL:
Keywords:
: 321965 382465 448222 453742 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-05-29 15:48 UTC by Cyp
Modified: 2022-05-14 12:36 UTC (History)
10 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Wrong word highlighted after adding comment in UTF-8 (241.67 KB, image/png)
2011-11-18 17:53 UTC, Pavel Punegov
Details
Placing UTF-8 string in stream output leads to the same bug (232.08 KB, image/png)
2011-11-18 17:59 UTC, Pavel Punegov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cyp 2011-05-29 15:48:05 UTC
Version:           4.2.1 (using KDE 4.6.2) 
OS:                Linux

int xyz; printf("%dψ\n", xyz);

In the above, the "yz)" is highlighted instead of the "xyz". Hovering the mouse over the "yz)" displays the variable popup for "xyz", hovering over the unhighlighted "x" doesn't.

More multibyte unicode characters results in more shifted syntax highlighting.

I'm using UTF-8 everywhere I can, haven't tried setting the editor to other encodings.

(Kdevelop version is 4.2.2, not 4.2.1, but couldn't select that when reporting.)

Reproducible: Always
Comment 1 Milian Wolff 2011-05-30 15:42:44 UTC
ugh indeed
Comment 2 Milian Wolff 2011-11-18 16:57:09 UTC
Git commit 0631a23f90edcf36c819452930c6134fdf449081 by Milian Wolff.
Committed on 18/11/2011 at 17:56.
Pushed by mwolff into branch 'master'.

reenable unit tests for breakage on multibyte cstrings

CCBUG: 274430

M  +6    -8    languages/cpp/cppduchain/tests/test_duchain.cpp
M  +21   -0    languages/cpp/parser/tests/test_parser.cpp
M  +2    -0    languages/cpp/parser/tests/test_parser.h

http://commits.kde.org/kdevelop/0631a23f90edcf36c819452930c6134fdf449081
Comment 3 Milian Wolff 2011-11-18 17:26:23 UTC
Git commit 18f67d95b92160f7a1a0c3c9f8ade94398f01c5b by Milian Wolff.
Committed on 18/11/2011 at 18:26.
Pushed by mwolff into branch 'master'.

add unit test showing that multibyte chars in comments also break our parser

CCBUG: 274430

M  +16   -0    languages/cpp/parser/tests/test_parser.cpp
M  +1    -0    languages/cpp/parser/tests/test_parser.h

http://commits.kde.org/kdevelop/18f67d95b92160f7a1a0c3c9f8ade94398f01c5b
Comment 4 Pavel Punegov 2011-11-18 17:53:46 UTC
Created attachment 65823 [details]
Wrong word highlighted after adding comment in UTF-8
Comment 5 Pavel Punegov 2011-11-18 17:59:50 UTC
Created attachment 65824 [details]
Placing UTF-8 string in stream output leads to the same bug
Comment 6 Gaël Le Baccon 2015-02-06 13:49:39 UTC
There is the same bug in 4.7.0 and again in new 4.7.1
Comment 7 Milian Wolff 2015-11-15 13:39:32 UTC
Still affects the clang-based C++ language plugin.
Comment 8 OlafLostViking 2016-12-01 12:48:01 UTC
I just tried

	std::cout << "OlafLostViking ❤ KDevelop!" << std::endl;

in KDevelop 5.0.2 and encountered the very same problem.
Comment 9 OlafLostViking 2016-12-02 16:11:42 UTC
Just tried with 5.0.3 (still valid) and wanted to pose a question to the KDevelop-developers:

Since this is a quite old bug report, initially for KDevelop 4.x, I wonder if it's wanted by you to keep this problem in this report or open a new one with an up-to-date version number? Not that it will be ignored when scanning through the bug lists and deleted eventually since it's for the 4.x branch.
Comment 10 Sakuraba Amane 2018-12-28 04:51:00 UTC
Problem still exists in ver. 5.3.1.
Comment 11 Vitaliy 2020-03-25 17:56:56 UTC
Still present in 5.4.4. I feel the developers are Americans only. But even in English, there are words that don’t fit in ASCII (e.g. “naïve”). Characters like *real* quotes and apostrophe are non-ASCII as well.
Comment 12 Milian Wolff 2020-03-25 20:03:27 UTC
The problem is that our editor (ktexteditor) operates in utf-16 and clang/parsing operates in utf-8. we will need to find a way to quickly translate from one to another to create the highlight ranges
Comment 13 Igor Kushnir 2022-01-11 09:11:17 UTC
*** Bug 448222 has been marked as a duplicate of this bug. ***
Comment 14 Igor Kushnir 2022-01-13 14:58:03 UTC
*** Bug 382465 has been marked as a duplicate of this bug. ***
Comment 15 Igor Kushnir 2022-01-13 15:00:20 UTC
*** Bug 321965 has been marked as a duplicate of this bug. ***
Comment 16 Igor Kushnir 2022-05-14 12:36:37 UTC
*** Bug 453742 has been marked as a duplicate of this bug. ***