Bug 469270

Summary: Skanpage crashes when trying to do OCR if tesseract is not installed
Product: [Applications] Skanpage Reporter: Silviu C. <silviucc>
Component: generalAssignee: Alexander Stippich <a.stippich>
Status: RESOLVED FIXED    
Severity: crash CC: jeremycook.co, nate
Priority: NOR    
Version First Reported In: 23.04.0   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Silviu C. 2023-05-02 15:24:31 UTC
SUMMARY
***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug symbols.
See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***
Skanpage crashes if OCR is selected when saving the file but tesseract-ocr is not actually installed.

STEPS TO REPRODUCE
1.  start Skanpage
2.  scan something with text
3.  when saving the PDF  either leave the option to add OCR checked or check it

OBSERVED RESULT

Skanpage crashes.

EXPECTED RESULT

Skanpage test whether tesseract is actually usable. If it's not, grey out the OCR option and inform user they don't have it installed on the system 

SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20230501
KDE Plasma Version: 5.27.4
KDE Frameworks Version: 5.105.0
Qt Version: 5.15.9
Kernel Version: 6.2.12-1-default (64-bit)
Graphics Platform: Wayland
Processors: 6 × Intel® Core™ i5-9600K CPU @ 3.70GHz
Memory: 31,1 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z390 GAMING X

ADDITIONAL INFORMATION
Comment 1 Nate Graham 2023-05-02 20:00:07 UTC
Tesseract is required for OCR; if it's not installed, it won't work. But if the OCR controls are appearing anyway despite Tesseract not being installed, that sounds like a bug in the app, as it shouldn't be happening. I just tested this by removing the `tesseract-devel package` and rebuilding the app, and I correctly don't see the OCR controls.

Maybe the reverse is not working, and it fails to hide them at runtime if compiled with Tesseract support included but the package isn't actually installed on the user's machine. Unfortunately I can't easily test this as my distro (Fedora KDE) makes Tesseract a mandatory package and it can't be removed at runtime.
Comment 2 Silviu C. 2023-05-03 02:33:13 UTC
(In reply to Nate Graham from comment #1)
> Tesseract is required for OCR; if it's not installed, it won't work. But if
> the OCR controls are appearing anyway despite Tesseract not being installed,
> that sounds like a bug in the app, as it shouldn't be happening. I just
> tested this by removing the `tesseract-devel package` and rebuilding the
> app, and I correctly don't see the OCR controls.
> 
> Maybe the reverse is not working, and it fails to hide them at runtime if
> compiled with Tesseract support included but the package isn't actually
> installed on the user's machine. Unfortunately I can't easily test this as
> my distro (Fedora KDE) makes Tesseract a mandatory package and it can't be
> removed at runtime.

I believe this is what is happening. Skanpage is compiled with tesseract support but tesseract is not installed.
Comment 3 Jeremy Cook 2023-10-22 22:55:34 UTC
This also crashes Skanpage on Debian 12 Bookworm when using "Export PDF." The "Enable optical character recognition (OCR)" option was checked by default (I did not check it) and no languages were listed (not even English). I had installed Skanpage via Discover from the Debian repo. I had not installed tesseract-ocr.

Next I apt installed tesseract-ocr, exited and restarted Skanpage. When I clicked Export PDF the Enable OCR option was checked like before but "American English [eng]" also appeared with an unchecked checkbox beside it. I left "American English [eng]" unchecked and "Enable optical character recognition (OCR)" checked, and was able to generate a PDF without the crash.

I also tried "Save All" instead of "Export PDF" before installing tesseract-ocr, and that worked without crashing. It presumably makes no attempt at OCR.

Operating System: Debian GNU/Linux 12
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.103.0
Qt Version: 5.15.8
Kernel Version: 6.1.0-13-amd64 (64-bit)
Graphics Platform: Wayland
Comment 4 Alexander Stippich 2024-03-03 11:09:40 UTC
Sorry for the late response. You are right, the availability of Tesseract is currently only checked at compile-time. When Skanpage is compiled with Tesseract present, but not added as a dependency, this results in the observed behavior.
Comment 5 Alexander Stippich 2025-01-29 17:16:52 UTC
Tesseract is now mandatory for Skanpage. While Skanpage could still be packaged incorrectly, it should not happen anymore in the future.