Bug 506673 - If a flipped document is scanned and then flipped manually, the OCR produces garbled text
Summary: If a flipped document is scanned and then flipped manually, the OCR produces ...
Status: REPORTED
Alias: None
Product: Skanpage
Classification: Applications
Component: general (other bugs)
Version First Reported In: 25.04.1
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Alexander Stippich
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-07-06 14:53 UTC by Mohammed Khoory
Modified: 2025-07-06 14:53 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
A sample document scanned up side down at the scanner side, and then flipped in skanpage and exported with OCR enabled (2.28 MB, application/pdf)
2025-07-06 14:53 UTC, Mohammed Khoory
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mohammed Khoory 2025-07-06 14:53:05 UTC
Created attachment 183010 [details]
A sample document scanned up side down at the scanner side, and then flipped in skanpage and exported with OCR enabled

SUMMARY
If I scan a document flipped at the scanner side, and then flip the pages in Skanpage before exporting to PDF with OCR enabled, then the OCR seems to produce garbled text.

Based on the output, my speculation is that when you do this, the OCR runs on the unflipped pages and are misinterpreted which results in garbled text. I can confirm that by scanning the same document right side up and then exporting with OCR, this issue does not occur.

I have attached a sample PDF which I scanned upside down and ran the OCR on to show what I mean.

STEPS TO REPRODUCE
1. Scan a document page upside down
2. Flip the page in Skanpage
3. Export the page with OCR enabled
4. Open the page in a PDF reader (Okular is what I used), select some text, copy and paste into a text editor

OBSERVED RESULT
The text is completely garbled and does not match what the exported PDF displays.

EXPECTED RESULT
The text should correspond to what the exported PDF displays


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 
Linux Kernel 6.6.90-1-MANJARO (64-bit)
KDE Plasma Version: 6.3.5
KDE Frameworks Version: 6.14.0
Qt Version: 6.9.0

ADDITIONAL INFORMATION
I am using Tessaract 5.5.0 and Wayland. The scanner I am using for testing is an HP Officejet Pro 8610/8620