Bug 66516 - spell checker: automatic language detection
Summary: spell checker: automatic language detection
Status: CONFIRMED
Alias: None
Product: kdelibs
Classification: Frameworks and Libraries
Component: kspell (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: Zack Rusin
URL:
Keywords:
: 73216 112264 198645 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-10-24 20:53 UTC by Daniel Naber
Modified: 2019-11-25 11:08 UTC (History)
21 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Naber 2003-10-24 20:53:20 UTC
Version:            (using KDE Devel)
Installed from:    Compiled sources

The (on-the-fly) spell checker could automatically detect the currently used language by selecting the language that leads to the least number of errors. As long as not enough (>10?) words have been typed, it could display "trying to auto-detect language". For the rare cases where the guess will be incorrect, it needs to be possible to manually change to a different language.

Also see the comments to #43349.
Comment 1 Dik Takken 2003-11-09 17:21:06 UTC
Before implementing this, it would be nice of the dictionary depends on the KDE language by default. Switching the KDE language should automatically switch the default KSpell dictionary to the same language, when available.

When no such dictionary is available, inform the user about that when the user changes the language of KDE.
Comment 2 Daan Goedkoop 2004-01-25 12:09:33 UTC
Looking up what language leads to the smallest number of errors, wouldn't that take a long time, especially when a large number of dictionaries is installed?

I would think that a language guesser algorithm, as can be found on some websites you find searching for "language guesser", would therefore be a better choice.

Also, when Aspell has a bug causing some language to generate no errors at all, this method is not affected by that.

Comment 3 Martin Küchler 2004-03-14 21:56:41 UTC
OpenOffice has this nice feature to check in all available dictionaries. Of course, this will slow down spellchecking. But I think it should still be considered as an alternative to a language guessing routine, because a) it is probably be easier to implement b) one can limit the spellchecking time by installing only the dictionaries he needs c) it might work better in a situation where more than one language is used in a single text (as it is IMHO often the case). 
Comment 4 Daniel Naber 2004-05-10 20:59:18 UTC
*** Bug 73216 has been marked as a duplicate of this bug. ***
Comment 5 Jose Da Silva 2004-08-08 12:40:19 UTC
I just finished reporting to ASPELL maintainers their aspell-default had missing words in it, but turns out that the culprit is here in KDE with the "default".

Aspell-default in the KDE spell check configuration should be changed from default to "English-US" if no programming fix is going to be made. I assummed default followed the country language, and chances are a novice would think default follows language/region as well.

If you are going to implement a language routine, which would be my wishlist too, then default is my prefered correct choice, unless forced by the user to something else.
Comment 6 Gilles Schintgen 2004-09-02 10:01:39 UTC
I'd very much like to see some simple autodetection like the one proposed by the reporter. If the slowdown is too important when a large number of dictionaries is installed, checkboxes could be added in the Control Center, to let the user choose what languages should be considered for automatic detection.

The more I think about it, the more I like this idea. Having KDE-wide automatic spell checking with language detection would be quite a killer feature.
Comment 7 Tristan Miller 2004-10-26 14:28:52 UTC
I'd also like to see this bug implemented, especially in light of Bug 79655 and Bug 79653.  I regularly compose messages in KNode and KMail in a variety of languages (English, French, German, Hungarian) and it's annoying to have to manually reconfigure the spell checker for each message, especially when doing so often causes crashes or requires the edit window to be closed and then reopened.

If the developers are looking for a language detection algorithm, apparently simple bigram methods work well, especially when they're being tested against only those dictionaries which have been installed by the user.  Also, I should mention that it's probably best if quoted text is *not* included in the detection data, as sometimes people will send a response in a language other than the one in which the original message was written.
Comment 8 Zack Rusin 2004-10-26 16:16:29 UTC
JFYI, I have this implemented locally for kspell2. There's no ETA for it 
as of now though since I have more important things on my plate. It 
should land late in November.

Comment 9 Ranma 2004-10-29 01:57:27 UTC
I'm so happy yo hear that Zack, thank you very much for this. This will be one of the most usefull things I will use on KDE. Continue your great work!
Comment 10 Roger Larsson 2005-10-07 16:51:51 UTC
The browser forms could autodetect language by using the language
declaration in the html tag.

From this page:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

(One more reason to add it :-)

BTW it is very dangerous to spellcheck each word on every installed language.
The selection has to be done on at most per sentence. (Manual override on
a word by word basis should be possible in the "Check spelling..." dialogue)

Comment 11 Mathias Homann 2005-11-28 21:25:58 UTC
this language detection thing would also come handy in kttsd to change speakers...
Comment 12 djib 2006-06-26 12:55:18 UTC
What about an applet that could be used to change kspell language with a single click or reystroke, just like changing the keyboard mapping ?
Comment 13 Peter Tselios 2006-06-28 08:08:48 UTC
Generally, this language detection could be easy to implemtent. And since users install anly the dictionaries they use, it could not be very time consuming. As always, it could be implemented as an option and users could activate/deactivate it.
Comment 14 Alberto Gonzalez 2006-10-31 15:07:57 UTC
I vote for this too. Many users use more than one language (typically 2) in their daily work, so we have to choose between having all words marked as mistakes when writing in "the other" language or changing manually the dictionary or to disable spell checking completely.

The idea of having an applet to change language manually in an easy and fast way could be good enough, though the ideal would be to check in both (or more) dictionaries (not many people write in more than 3 or 4 languages as to make it too slow, I think).
Comment 15 Pablo Diaz-Gutierrez 2007-12-27 04:16:10 UTC
This would be a great feature to showcase. For bonus, it could also change the keyboard setup on the fly, as well as re-map all the mistyped letters to the corresponding ones in the new keyboard. Like ';' -> 'ñ', in an English/Spanish transition.
Comment 16 Stefan Kombrink 2008-04-18 22:44:22 UTC
much more performant and flexible than spell-checking would it be to use a Neural Net approach like described here:
http://www.codeproject.com/KB/library/Fann.aspx

I once tried it and it works for very short text already (10 words), below that number of words it does not spell check anyways, does it?

By using such approach it could even spell check multiple languages within one email, and learn languages from formerly written emails by providing a simple interface.
Comment 17 Alberto Gonzalez 2008-04-18 23:55:49 UTC
Isn't this what Sonnet has implemented for KDE4? I have not seen it in action yet when testing KDE4, but I guess it will be integrated soon. If so, probably this request could be closed. Anyone knows more about it?
Comment 18 Erik Boritsch 2008-11-24 18:08:30 UTC
Language autodetection is nice feature, but I think that it will be more than sufficient to add an option "Use all installed dictionaries" or similar. It won't slow KDE that much plus is easier to implement.
A temporary solution might be creating multi-language dictionaries. Good example would be English-Russian dictionary Russian Mozilla Community has created.

Zack, what approach did you use in kspell2?
Comment 19 Tristan Miller 2008-11-24 18:21:47 UTC
There is a language detection library available, libtextcat, which is available at <http://software.wise-guys.nl/libtextcat/>.  The license appears to be BSD-like (without the advertising clause) and therefore GPL-compatible.
Comment 20 Murz 2009-01-16 14:01:58 UTC
How can I convert an English-Russian dictionary from Russian Mozilla Community to the KDE dictionary?
Comment 21 Nick Shaforostoff 2009-01-16 21:35:08 UTC
you want aspell dictionary format
Comment 22 Christoph Feck 2009-06-26 07:01:27 UTC
*** Bug 112264 has been marked as a duplicate of this bug. ***
Comment 23 Christoph Feck 2009-09-02 02:34:37 UTC
*** Bug 198645 has been marked as a duplicate of this bug. ***
Comment 24 Stéphane Magnenat 2010-02-12 23:14:55 UTC
What is the status of this bug? Sonnet is suppose to support this feature, but I did not see it being deployed in any application up-to-date (KDE 4.4).
Comment 25 m.wege 2011-06-25 08:42:33 UTC
Seems like there is no chance, that automatic language detection is going to be implemented any time soon. May be instead of automatic language detection, it would be easier to implement selecting a second dictionary for spell check in KDE settings?
Comment 26 RussianNeuroMancer 2012-01-21 14:10:44 UTC
> May be instead of automatic language detection, it would be easier to implement selecting a second dictionary for spell check in KDE settings?
Two dictionaries sometimes is not enough. For example in Ukraine user may need three dictionaries: Ukrainian, Russian and English. So then list of used dictionaries shouldn't be limited.
Comment 27 RussianNeuroMancer 2013-05-04 20:15:44 UTC
Since Calligra Suite and KDE-Telepathy using KDE spellchecking, in my opinion this issue is major.
Comment 28 Peter Tselios 2013-08-06 09:54:10 UTC
I don't want to insult anyone, just to point that 10 years after the bug/wish opening is a lot of time for such important functionality.
Comment 29 mau 2014-08-22 14:45:58 UTC
Hey, first comment in 2014. Well, it would still be a nice feature :-)
Comment 30 Massimiliano 2015-08-17 16:19:13 UTC
And first comment of 2015, the issue is still wanted!
Comment 31 Christoph Feck 2015-12-03 20:54:05 UTC
Sonnet has this feature in its API, but I am not sure if/how it needs to be enabled from applications.

See http://api.kde.org/frameworks-api/frameworks5-apidocs/sonnet/html/classSonnet_1_1GuessLanguage.html
Comment 32 Waqar Ahmed 2019-11-24 06:19:17 UTC
Hi,
This feature is now available in Sonnet and is working. Quite a few applications have sonnet auto detection option. We are using it in QOwnNotes.
This bug should now be closed.