Version: 0.3-0ubuntu1 (using KDE 4.2.3) OS: Linux Installed from: Ubuntu Packages Make Ocropus and tesseract part of a KDE for use in a "paperless office". KDE is a wellknown and widespread operation enviroment. What Linux/KDE missed so far was a good OCR machine. With Ocropus this seems to come to a better state. Wish: Make Ocropus with the use of tesseract be part of KDE. Integrate it into skanlite - or build another application that uses its power. http://code.google.com/p/ocropus/ OCRopus(tm) is a state-of-the-art document analysis and Optical Character Recognition (OCR) system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. ( License: Apache License 2.0 ) Ocropus is now in Debian - and will be in Ubuntu, too starting with karmic. Ocropus makes use of tesseract: http://code.google.com/p/tesseract-ocr/ The Tesseract OCR engine was originally developed at HP between 1985 and 1995. It was open-sourced by HP and UNLV in 2005 and Google has lead further development. ( License: Apache License 2.0 ) Personal statement: Ocropus produces by far the best OCR result that I have ever seen on Linux! - This is really worth to be used for a KDE office! The next step would be a GUI that uses spellchecking and correction by the user. - But the first thing will be just to make use of the commandline power in KDE.
I totally agree that KDE needs the OCR, but skanlite is not the right application for that. I would be more that happy to help somebody that wants to do an OCR application that uses libksane. There was somebody (don't remember now who it was) doing some OCR app, but would use Akonadi to save the documents. There was a short discussion on kde-imaging or kde-devel...
*** Bug 426829 has been marked as a duplicate of this bug. ***
Too bad to see that nothing has changed since 2009 :( I guess I'll have to keep using my crappy self-made script that combines scanimage + imagemagick's img2pdf + ocrmypdf
You could always try https://invent.kde.org/utilities/skanpage There has been quite some development there the last half year. And why not scratch your itch and help with the OCR part ;)
(In reply to Kåre Särs from comment #4) > You could always try https://invent.kde.org/utilities/skanpage > > There has been quite some development there the last half year. > > And why not scratch your itch and help with the OCR part ;) Thanks, that's excellent news, I'm definitely gonna keep an eye on it :)
Now that Skanpage features OCR capabilities, is this (~15 years old!) bug report still relevant?
Yep, I think we can point to Skanpage for the OCR parts