Bug 433467 - doxygen.xml [and others] have invalid "[]" in regex
Summary: doxygen.xml [and others] have invalid "[]" in regex
Status: RESOLVED NOT A BUG
Alias: None
Product: frameworks-syntax-highlighting
Classification: Frameworks and Libraries
Component: syntax (other bugs)
Version First Reported In: unspecified
Platform: Other Other
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-23 06:58 UTC by Gene Thomas
Modified: 2021-03-03 00:00 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gene Thomas 2021-02-23 06:58:25 UTC
SUMMARY

[] appears in regexs. That means a single character but is not allowed to be anything, there is nothing between the [ and ]. The ICU regex engine I am using rejects this.

STEPS TO REPRODUCE
1. Read doxygen.xml
2. It declares and entity wordsep as "(?:[][,?;()]|\.$|\.?\s)"
3. This entity is used in RegExpr's

OBSERVED RESULT

This is ok

EXPECTED RESULT

Should be an error and the .xml corrected

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

head of https://github.com/KDE/syntax-highlighting

ADDITIONAL INFORMATION
Comment 1 Jonathan Poelen 2021-02-27 23:57:39 UTC
[]] is valid with PCRE (regex engine used) where ] as the first character does not correspond to a closure (same with [^]]).

ICU regex does not seem to support all PCRE syntax, it lacks for example (?|...) or \R which are also used.
Comment 2 Gene Thomas 2021-03-03 00:00:22 UTC
Thanks, I've switched from ICU to PCRE, much faster. Part of the problem is that ICU jumps through hoops to be correct. For example in German the regex (case insensitive) "^ẞ$" matches "SS" [2 code points], no other regex implementations do this that I have seen. ICU was getting into a internal infinite loop and throwing a "regex out of stack space" after 0.5 sec, lots of times, which made a .sh file take 30 seconds to syntax highlight!