Summary: | spell checker: automatic language detection | ||
---|---|---|---|
Product: | [Unmaintained] kdelibs | Reporter: | Daniel Naber <misc2006> |
Component: | kspell | Assignee: | Zack Rusin <zack> |
Status: | RESOLVED UNMAINTAINED | ||
Severity: | wishlist | CC: | arvidjaar, asn, aspotashev, b-misc, erik, goffi, kde.bugzilla.2012, leoni.massimiliano1, m.wege, marcus, MurzNN, peger, psychonaut, ptselios, shafff, spiros, stephane, sven.burmeister, tobias, waqar.17a, yehielb |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Daniel Naber
2003-10-24 20:53:20 UTC
Before implementing this, it would be nice of the dictionary depends on the KDE language by default. Switching the KDE language should automatically switch the default KSpell dictionary to the same language, when available. When no such dictionary is available, inform the user about that when the user changes the language of KDE. Looking up what language leads to the smallest number of errors, wouldn't that take a long time, especially when a large number of dictionaries is installed? I would think that a language guesser algorithm, as can be found on some websites you find searching for "language guesser", would therefore be a better choice. Also, when Aspell has a bug causing some language to generate no errors at all, this method is not affected by that. OpenOffice has this nice feature to check in all available dictionaries. Of course, this will slow down spellchecking. But I think it should still be considered as an alternative to a language guessing routine, because a) it is probably be easier to implement b) one can limit the spellchecking time by installing only the dictionaries he needs c) it might work better in a situation where more than one language is used in a single text (as it is IMHO often the case). *** Bug 73216 has been marked as a duplicate of this bug. *** I just finished reporting to ASPELL maintainers their aspell-default had missing words in it, but turns out that the culprit is here in KDE with the "default". Aspell-default in the KDE spell check configuration should be changed from default to "English-US" if no programming fix is going to be made. I assummed default followed the country language, and chances are a novice would think default follows language/region as well. If you are going to implement a language routine, which would be my wishlist too, then default is my prefered correct choice, unless forced by the user to something else. I'd very much like to see some simple autodetection like the one proposed by the reporter. If the slowdown is too important when a large number of dictionaries is installed, checkboxes could be added in the Control Center, to let the user choose what languages should be considered for automatic detection. The more I think about it, the more I like this idea. Having KDE-wide automatic spell checking with language detection would be quite a killer feature. I'd also like to see this bug implemented, especially in light of Bug 79655 and Bug 79653. I regularly compose messages in KNode and KMail in a variety of languages (English, French, German, Hungarian) and it's annoying to have to manually reconfigure the spell checker for each message, especially when doing so often causes crashes or requires the edit window to be closed and then reopened. If the developers are looking for a language detection algorithm, apparently simple bigram methods work well, especially when they're being tested against only those dictionaries which have been installed by the user. Also, I should mention that it's probably best if quoted text is *not* included in the detection data, as sometimes people will send a response in a language other than the one in which the original message was written. JFYI, I have this implemented locally for kspell2. There's no ETA for it as of now though since I have more important things on my plate. It should land late in November. I'm so happy yo hear that Zack, thank you very much for this. This will be one of the most usefull things I will use on KDE. Continue your great work! The browser forms could autodetect language by using the language declaration in the html tag. From this page: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> (One more reason to add it :-) BTW it is very dangerous to spellcheck each word on every installed language. The selection has to be done on at most per sentence. (Manual override on a word by word basis should be possible in the "Check spelling..." dialogue) this language detection thing would also come handy in kttsd to change speakers... What about an applet that could be used to change kspell language with a single click or reystroke, just like changing the keyboard mapping ? Generally, this language detection could be easy to implemtent. And since users install anly the dictionaries they use, it could not be very time consuming. As always, it could be implemented as an option and users could activate/deactivate it. I vote for this too. Many users use more than one language (typically 2) in their daily work, so we have to choose between having all words marked as mistakes when writing in "the other" language or changing manually the dictionary or to disable spell checking completely. The idea of having an applet to change language manually in an easy and fast way could be good enough, though the ideal would be to check in both (or more) dictionaries (not many people write in more than 3 or 4 languages as to make it too slow, I think). This would be a great feature to showcase. For bonus, it could also change the keyboard setup on the fly, as well as re-map all the mistyped letters to the corresponding ones in the new keyboard. Like ';' -> 'ñ', in an English/Spanish transition. much more performant and flexible than spell-checking would it be to use a Neural Net approach like described here: http://www.codeproject.com/KB/library/Fann.aspx I once tried it and it works for very short text already (10 words), below that number of words it does not spell check anyways, does it? By using such approach it could even spell check multiple languages within one email, and learn languages from formerly written emails by providing a simple interface. Isn't this what Sonnet has implemented for KDE4? I have not seen it in action yet when testing KDE4, but I guess it will be integrated soon. If so, probably this request could be closed. Anyone knows more about it? Language autodetection is nice feature, but I think that it will be more than sufficient to add an option "Use all installed dictionaries" or similar. It won't slow KDE that much plus is easier to implement. A temporary solution might be creating multi-language dictionaries. Good example would be English-Russian dictionary Russian Mozilla Community has created. Zack, what approach did you use in kspell2? There is a language detection library available, libtextcat, which is available at <http://software.wise-guys.nl/libtextcat/>. The license appears to be BSD-like (without the advertising clause) and therefore GPL-compatible. How can I convert an English-Russian dictionary from Russian Mozilla Community to the KDE dictionary? you want aspell dictionary format *** Bug 112264 has been marked as a duplicate of this bug. *** *** Bug 198645 has been marked as a duplicate of this bug. *** What is the status of this bug? Sonnet is suppose to support this feature, but I did not see it being deployed in any application up-to-date (KDE 4.4). Seems like there is no chance, that automatic language detection is going to be implemented any time soon. May be instead of automatic language detection, it would be easier to implement selecting a second dictionary for spell check in KDE settings? > May be instead of automatic language detection, it would be easier to implement selecting a second dictionary for spell check in KDE settings?
Two dictionaries sometimes is not enough. For example in Ukraine user may need three dictionaries: Ukrainian, Russian and English. So then list of used dictionaries shouldn't be limited.
Since Calligra Suite and KDE-Telepathy using KDE spellchecking, in my opinion this issue is major. I don't want to insult anyone, just to point that 10 years after the bug/wish opening is a lot of time for such important functionality. Hey, first comment in 2014. Well, it would still be a nice feature :-) And first comment of 2015, the issue is still wanted! Sonnet has this feature in its API, but I am not sure if/how it needs to be enabled from applications. See http://api.kde.org/frameworks-api/frameworks5-apidocs/sonnet/html/classSonnet_1_1GuessLanguage.html Hi, This feature is now available in Sonnet and is working. Quite a few applications have sonnet auto detection option. We are using it in QOwnNotes. This bug should now be closed. Hi, kdelibs (version 4 and earlier) is no longer maintained since a few years. KDE Frameworks 5 or 6 might already have implemented this wish. If not, please re-open against the matching framework if feasible or against the application that shows the issue. We then can still dispatch it to the right Bugzilla product or component. Greetings Christoph Cullmann |