Bug 90060 - save scanned images with OCR result as pdf or png
Summary: save scanned images with OCR result as pdf or png
Status: CONFIRMED
Alias: None
Product: kooka
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Debian testing Linux
: NOR wishlist
Target Milestone: ---
Assignee: Klaas Freitag
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-23 00:23 UTC by Fred Schättgen
Modified: 2007-07-05 19:17 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fred Schättgen 2004-09-23 00:23:13 UTC
Version:           0.44 (using KDE KDE 3.3.0)
Installed from:    Debian testing/unstable Packages

Someone suggested to save directly to pdf already (bug 85000).

I understand that pdf is not the best format to save pitctures. But how about including the results of an OCR run into the PDF file as invisible text? Then the pdf could be searched by other tools. 

AFAIK Adobe acrobat can do somthing similar (I think I remember that this feature was removed from the standard version at some point, but anyway..). 
This could make kooka a great tool to archive document. Then there could be a shortcut to scan, ocr, export-as-pdf a document in one run. It could also ask the user if he/she wants to append more pages.

I'm not sure if there is a nicely integrated tool to search pdf files already, but there was a lot of talk about better searching tool for kde in the last weeks. This will/should probably include searching pdf files as well, so if it's not there already, it will be coming.
Then we could scan letters, bills, whatever, save them as pdf and easily search for them. Ok, the search program should have a "fuzzy" option *g*

It wouldn't hurt if the extracted text is not at the same position as the corresponding text in the bitmap, as long as the text is on the same page.. 

Alternative:
If saving as pdf seems too complicated, how about saving the ocr results in the comment header of a png image? This could be indexed/searched as well. I guess it would even work with kfind's existing meta-info search function.
That would be a good start.. what do you think?