Bug 337145 - Spell checking should avoid numbers with decimal points or other separators
Summary: Spell checking should avoid numbers with decimal points or other separators
Status: RESOLVED WORKSFORME
Alias: None
Product: kdelibs
Classification: Unmaintained
Component: kspell (show other bugs)
Version: 4.13.2
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: kdelibs bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-06 15:32 UTC by Karthik Periagaram
Modified: 2022-11-18 05:16 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Patch to avoid spell checking words with numerals (1.28 KB, patch)
2014-07-27 05:23 UTC, Karthik Periagaram
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Karthik Periagaram 2014-07-06 15:32:59 UTC
Automatic spell check in katepart flags numbers with decimals or separators as spelling errors. This is bad because, a) it makes actual spell checking difficult, b) it wastes system resources and c) it shows a lack of polish on our part.

a) Making actual spell checking difficult

When editing SRT files (subtitle files containing time stamps), spell check highlights all the time codes with the comma separator. Example,

16
00:52:59,468 --> 00:53:02,468
Han, my boy, you disapppoint me.

In the example above, I would only want to see "disapppoint" highlighted, but spell check highlights "59,468" and "02,468" as well making spell checking difficult. My recourse has been to open the file in vim and use its spell checking.

b) Wasting system resources

This gets particularly bad if a large data file is opened. By large data file, I mean millions of numbers in rows and columns, often in csv or tsv formats. The spell check uses up a lot of CPU if such a data file is opened. This was a big reason why I kept auto spell check turned off in grad school, even though I would have preferred it on. This also brings up another corner case: exponential/scientific notation. 3.14e0 or 2.718E+0 should not be considered spelling errors.

c) Lack of polish

I think this point is self-evident. This is one of those papercut-class bugs (as Ubuntu was wont to call them) that should be fixed.


Reproducible: Always

Steps to Reproduce:
1. Open a new text file and enter the following:

1
1.0
100.1
1,000
1.1e11
1.1E+1

2. If auto spell check isn't on already, turn it on with Ctrl+shift+O. If on, toggle it (I've noticed the errors don't appear immediately, another bug)
Actual Results:  
All numbers apart from the integer 1 get flagged as spelling errors, some partially.

Expected Results:  
None of the numbers should be flagged as a spelling error.
Comment 1 Karthik Periagaram 2014-07-06 20:49:03 UTC
Noting additional number formats that may need to be excluded from spell checking:

1.234(12)

This is a notation used to indicate a value and the error in the significant digits. In this case, the value is 1.234, but could be anywhere between 1.222 and 1,246. Commonly seen in scientific data.

1/2

Or any combination of digits and operators, including comparison operators.
Comment 2 Karthik Periagaram 2014-07-27 05:23:05 UTC
Created attachment 87974 [details]
Patch to avoid spell checking words with numerals
Comment 3 Karthik Periagaram 2014-07-27 05:25:19 UTC
Adding an update on this bug, in case any one else is following it.

The source of the bug is an earlier commit intended to flag typical OCR errors like 1ink (one-eye-en-kay). This has the side effect of flagging any numeric literal with a non-numeric symbol in it. Example, 1.0 has a period/decimal point; 1,000 has a field separator, etc.

The first part of the fix is to revert back to basically avoiding all strings with numbers in them. This is a temporary fix (and matches the behavior of most spell checkers) and a longer term solution that can intelligently pick out spelling errors when numerals are involved needs to be found.

The initial fix patch is attached here and has been sent to the maintainer for review. I've also added the maintainer to this thread.
Comment 4 Justin Zobel 2022-10-19 22:11:08 UTC
Thank you for reporting this bug in KDE software. As it has been a while since this issue was reported, can we please ask you to see if you can reproduce the issue with a recent software version?

If you can reproduce the issue, please change the status to "CONFIRMED" when replying. Thank you!
Comment 5 Bug Janitor Service 2022-11-03 05:06:31 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 6 Bug Janitor Service 2022-11-18 05:16:47 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!