| Summary: | request: spellchecking to be HTML aware | ||
|---|---|---|---|
| Product: | [Frameworks and Libraries] frameworks-syntax-highlighting | Reporter: | Richard Neill <kde> |
| Component: | syntax | Assignee: | KWrite Developers <kwrite-bugs-null> |
| Status: | CONFIRMED --- | ||
| Severity: | wishlist | CC: | christoph, jonathan.poelen, walter.von.entferndt |
| Priority: | NOR | ||
| Version First Reported In: | unspecified | ||
| Target Milestone: | --- | ||
| Platform: | Ubuntu | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
One needs to mark the stuff that shall not be spell checked in the syntax definition. This has been done since 2016 and on my side h2 and nbsp are ignored. I don't understand, which editor are you using? I tried with Kate. Hi - and thanks for your reply. I've realised what's happening - there is an issue here, but it's not quite what I thought it was. Test case 1. Create a file with a .php extension, and (without entering php mode, with <?php), just have this line: --- BEGIN --- <h5>This is a nonébreakzing title <a href='http://examplze.com/'>link</a> </h5> --- END --- This is handled correctly, identifying the misspelled 'breakzing' but nothing else. Test case 2. In the same file, enter PHP mode and echo it. --- BEGIN --- <?php echo "<h5>This is a non breakzing title with a $var[keytzypo] embedded and a <a href='http://examplze.com/'>link</a> </h5>"; ?> --- END --- In this case, the spellchecker triggers on: * h5 - this is a legal tag * nbsp - this is a legal entity * breakzing - this IS a typo which we wanted to find - correct behaviour. * keytzypo - this is an array-key - shouldn't trigger. (normal, non-array variables are OK). * examplze - part of an URL, shouldn't trigger. * h5 - again. => So, this bug report should really be about checking, when it is within a quoted-string in PHP. This way of writing code is so common (i.e. switching in and out of PHP-mode by using echo, rather than with ?>...<php) that I didn't notice the echo was a critical part of the bug-report - and KWrite's normal helpful behaviour of highlighting multiple instances of the same string meant that, when it highlighted the h5 within the echo, it also highlights the one in the normal html, which may be why I didn't. My error - sorry. * The behaviour is the same in KWrite and in Kate (as we would expect). * The same behaviour occurs for any way of quoting a string: single, double, or heredoc. * A minor unrelated point I spotted: in HTML mode, Kate/KWrite correctly ignores everything in a tag. But perhaps it should check title and alt attributes. Thanks for your help. I understand better, but it's a rather complicated thing to do with undesirable effects. For example, in the simple case of checking title and alt in HTML, this involves adding a "color" that would have the particularity of being verifiable. <img class="..." alt="bla bla"> with <img is Element class= is Attribute "..." is Value alt= is Attribute "bla bla" is spellCheckableValue Value and spellCheckableValue are in fact values, but the syntax must expose 2 distinct values, which implies modifying 2 colors if you want to change the color of the attributes. Although this can be done, I think the resulting behavior is strange from the user's point of view. The same goes for PHP with String, Heredoc, Nowdoc which, in addition to creating false positives, is a real pain when you have to juggle with a lot of interleave highlights. A plugin could do it, but I'm thinking that it would end up duplicating a lot of code, both for the one that handles spell checking and the one that detects useful parts of the syntax. I'm wondering whether it wouldn't be possible to add alternative syntaxes dedicated to spell checking, ideally with a way of selecting them when checking is activated. Since this kind of "syntax" eliminates the need for highlighting, the detection of language elements could be simplified. |
When writing an HTML or PHP page, it would be great if the Spellchecker would check the English language bits, but not complain about the tags and class-names, and should be entity-aware. So for example: <h2 >grey elephant</h2> will flag up "h2" and "nbsp" as wrong words. whereas I think it should only check the words "grey" and "elephant". Incidentally, I saw that bug #321593 is considered too complex to implement, but perhaps we could at least handle the (relatively) common case of web programming, and get the low-hanging-fruit, teaching it to ignore <[^>]+> and &[:alnum:]+; Thanks :-)