*** If you're not sure this is actually a bug, instead post about it at https://discuss.kde.org If you're reporting a crash, attach a backtrace with debug symbols; see https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports *** SUMMARY The "Export PDF" functionality allows me to create a PDF with text recognition in different languages. Clicking the button "Export PDF" shows up a window that should list the languages available for the tesseract module to use in the process of the text recognition. However, with the latest update of skanpage (deb package from the repository), the list of languages is no more visible, and the OCR is not working anymore. With the previous version of skanpage instead, the OCR was perfectly working. Maybe in the latest deb package the "tesseract" dependency is missing? STEPS TO REPRODUCE 1. Scan a page with Skanpage 2. Click the "Export PDF" button 3. The list of languages for the OCR is missing OBSERVED RESULT The list of languages for the OCR functionality is missing, so the PDF output has no text recognized. EXPECTED RESULT It should be possible to select the languages that I want to recognize. The PDF output should contain the recognized text, and I should be able to copy it. SOFTWARE/OS VERSIONS Windows: n/a macOS: n/a Linux/KDE Plasma: KDE neon 6.0 (based on ubuntu 22.04) (available in About System) KDE Plasma Version: 6.0.5 KDE Frameworks Version: 6.2.0 Qt Version: 6.7.0 ADDITIONAL INFORMATION
Have you checked that you still have the corresponding tesseract language files installed?
(In reply to Alexander Stippich from comment #1) > Have you checked that you still have the corresponding tesseract language > files installed? Hi, I have made no changes to my system. The following tesseract packages are installed since the beginning on my system: ===================================================================================================== nicola@nicola-XPS-13-9360:~ ➤ apt list --installed | grep tesseract WARNING: apt does not have a stable CLI interface. Use with caution in scripts. libtesseract4/jammy,now 4.1.1-2.1build1 amd64 [installato, automatico] tesseract-ocr-eng/jammy,jammy,now 1:4.00~git30-7274cfa-1.1 all [installato, automatico] tesseract-ocr-ita/jammy,jammy,now 1:4.00~git30-7274cfa-1.1 all [installato] tesseract-ocr-osd/jammy,jammy,now 1:4.00~git30-7274cfa-1.1 all [installato, automatico] tesseract-ocr/jammy,now 4.1.1-2.1build1 amd64 [installato] nicola@nicola-XPS-13-9360:~ =====================================================================================================
The tesseract dependency was bumped to 5 fpr 24.05. Is tesseract5 available in Ubuntu 22.04?
(In reply to Alexander Stippich from comment #3) > The tesseract dependency was bumped to 5 fpr 24.05. Is tesseract5 available > in Ubuntu 22.04? Unfortunately no. In Ubuntu 22.04 there is the package "libtesseract4". On the "Ubuntu packages" website I see that the package "libtesseract5" is included starting Ubuntu 23.10 (mantic). I'm using KDE Neon that upgrades between LTS editions only, thus, I suppose that tesseract5 will be available when KDE Neon will be upgraded to Ubuntu 24.04 (noble). Or you know other viable options?
There is a ppa that should work, but I have not tested it: https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr5?field.series_filter=jammy
Created attachment 171010 [details] The window of the functionality "Export PDF" The screenshot shows that there is no language selection in the window for the OCR text recognition.
(In reply to Alexander Stippich from comment #5) > There is a ppa that should work, but I have not tested it: > https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr5?field. > series_filter=jammy The PPA installs indeed the version 5 of tesseract and the languages I need, as you can see here: ===================================================================================================== nicola@nicola-XPS-13-9360:~ ➤ apt list --installed | grep tesseract libtesseract5/jammy,now 5.4.1-1ppa1~jammy1 amd64 [installato, automatico] tesseract-ocr-eng/jammy,jammy,now 1:5.0.0~git39-6572757-2ppa1~jammy1 all [installato, automatico] tesseract-ocr-ita/jammy,jammy,now 1:5.0.0~git39-6572757-2ppa1~jammy1 all [installato] tesseract-ocr-osd/jammy,jammy,now 1:5.0.0~git39-6572757-2ppa1~jammy1 all [installato, automatico] tesseract-ocr/jammy,now 5.4.1-1ppa1~jammy1 amd64 [installato] nicola@nicola-XPS-13-9360:~ ➤ tesseract --list-langs List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (3): eng ita osd nicola@nicola-XPS-13-9360:~ ===================================================================================================== But unfortunately the issue is still present: no languages selection available for the OCR text recognition. The screenshot "The window of the functionality Export PDF" that I have uploaded, shows you that the "Export PDF" window is missing the language selection list.
I'm afraid that you have to wait for KDE Neon being rebased to 24.04
(In reply to Alexander Stippich from comment #8) > I'm afraid that you have to wait for KDE Neon being rebased to 24.04 OK, I understand and I can live with it. The wait shouldn't be too long. Thank you anyway for your support.
Is this still an issue with KDE neon based on 24.04?
(In reply to Alexander Stippich from comment #10) > Is this still an issue with KDE neon based on 24.04? I'm sorry, I have completely forgotten to give you a feedback after my upgrade to 24.04. Anyway, I have good news: this issue, after the upgrade, is gone 👍. For me, this bug report can be closed now. Thank you.
Thanks for the feedback!