Summary: | New tool to perform OCR of an image and store result in metadata | ||
---|---|---|---|
Product: | [Applications] digikam | Reporter: | arthur.lutz |
Component: | Plugin-Generic-OcrTextConverter | Assignee: | Digikam Developers <digikam-bugs-null> |
Status: | RESOLVED FIXED | ||
Severity: | wishlist | CC: | as9902613, caulier.gilles, harshilpatel1973, jose_oliver, Paul.Daily.001 |
Priority: | NOR | ||
Version: | 6.4.0 | ||
Target Milestone: | --- | ||
Platform: | Manjaro | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | 8.0.0 | |
Sentry Crash Report: |
Description
arthur.lutz
2005-12-03 23:58:37 UTC
What's about this file ? i think it's out of subject with digiKam Gilles I can't think of any practical use. digiKam is mainly for photo management program, at least my photos do not contain text that needs to be extracted :-) +1 WONTFIX Andi +1 for WONTFIX Please reopen this bug / WishForANewTool. Many people take pictures of products, signs, notes, screenshots, etc. basically images with text in them. With OCR we can generate tags, add a description or a caption automatically. This would make searching much much better. Just like what google does with PDF documents. Even better, it would be nice if OCR could to tell a document apart from a photograph. I would like to put documents in a different album. I plan to write later an external dplugin to operate OCR using Neural Network based on Tesserrac : https://github.com/tesseract-ocr/tesseract Gilles Caulier I have a significant amount of "Snapchat" screenshots that would be great to filter the text on. I also screenshot many important things and it would be good to have searchable OCR. Is there a bounty for this? I recognize that you don't feel the need for it, but as people migrate from Google photos to local solutions now that Google is a paid software, I think you'll find more people looking for this kind of feature. Old KDE scan application Kooka has a tesseract plugin : https://invent.kde.org/graphics/kooka/-/tree/master/plugins/ocr Gilles Caulier Noticed that this bug got included in the summer of code event, and I belive this would be an extremely helpful tool. Services such as Google Photos and OneDrive Photos leverage such OCR functionalities. Of usefulness is that OneDrive displays the recognized text in the info pane. Here are some questions meant to aid in the development of such functionality for Digikam: - How would the text be stored using existing metadata fields or would new fields be created under the digikam XMP schema namespace? - Would the location of the words in the image also be stored so that they cab highlighted like face regions? - Would the user be able to correct any OCR errors? - Would it be capable of recognizing a type of document and adding a keyword tag (Example: receipt, screenshot, business card, invoice, bank check, blueprint, ect)? - Beyond, printed material (letters, receipts), would it be able to read text in photos in which a sign appears (Example: A street sign)? - Aware this may be beyond the scope of OCR, but it would be interesting if barcodes/qr codes could be read and such information could be also stored within the file. The Metadata Working Group Spec provides for storing barcode regions with type=BarCode (ref page 54, MWG Working group spec 2010) Implemented while Google Summer of Code 2022: https://community.kde.org/GSoC/2022/StatusReports/QuocHungTran Gilles Caulier |