Bug 385544

Summary: Validate regular expressions passed to QRegExp and QRegularExpression
Product: [Developer tools] clazy Reporter: Thomas Fischer <fischer>
Component: generalAssignee: Sergio Martins <smartins>
Status: CONFIRMED ---    
Severity: wishlist CC: smartins
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Thomas Fischer 2017-10-10 07:33:33 UTC
It would be nice if clazy could validate regular expressions passed in text form to QRegExp or QRegularExpression.
This should be either done by trying to parse/compile the regular expression passed to either class or by checking the passed regular expression for common mistakes (see examples below) based on context-sensitive text patterns rather than compiling the expression.

Example for which clazy should warn (passing raw char* to constructors to keep examples brief):
QRegularExpression("\bA"); // should be \\bA
QRegExp("v\\d([.]\\d)*)"); // mismatching parentheses
Comment 1 Sergio Martins 2017-10-10 09:49:01 UTC
we could use a regular expression to validate regular expressions :)
Comment 2 Sergio Martins 2017-10-18 08:02:29 UTC
One option is to link to pcre or Qt and validate the string.
Additionally, I think it should warn if you're not using C++11 raw-string-literals, which makes code much less error prone.
Comment 3 Thomas Fischer 2017-10-18 11:24:53 UTC
(In reply to Sergio Martins from comment #2)
> One option is to link to pcre or Qt and validate the string.
Using pcre would have the advantage of being a 'lighter' dependency than Qt and maybe more generally already installed. Using Qt would have the advantage to be more 'realistic'.

> Additionally, I think it should warn if you're not using C++11
> raw-string-literals, which makes code much less error prone.

If it would be technically possible, make the dependency on pcre or Qt compile-time optional, i.e. a configuration/build flag.
Coded in clazy could be some basic checks that would be applied before any pcre/Qt tests and be available even if pcre/Qt support would be disabled. Those basic checks could be:
- Correct usage of raw-string literals as you mention, e.g. \b vs \\b
- Correct matching of parenthesises: (..[..]{..}..) with some support for special cases such as \( or [^{]
... in general some quick and dirty checks for common mistakes and always cheaper than parsing the regexp in pcre or Qt. For more inspiration check StackExchange for common problems programmers have with regexps ... ;-)