Bug 466521

Summary: Characters such as "⑥" cannot be copied as in Chrome, but can only be pasted as "6"
Product: [Applications] okular Reporter: llzspaul <llzspaul>
Component: PDF backendAssignee: Okular developers <okular-devel>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version First Reported In: 22.12.1   
Target Milestone: ---   
Platform: Other   
OS: Other   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description llzspaul@sharklasers.com 2023-02-27 11:54:07 UTC
SUMMARY
***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug symbols.
See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***


STEPS TO REPRODUCE
1. 
2. 
3. 

OBSERVED RESULT
"⑥" CANNOT be copied
can only be pasted as "6"
EXPECTED RESULT
"⑥"  CAN be copied

SOFTWARE/OS VERSIONS
Windows: Windows  10.0.19045



ADDITIONAL INFORMATION
Comment 1 Bug Janitor Service 2024-03-05 09:58:13 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/941
Comment 2 Sune Vuorela 2024-05-24 10:52:26 UTC
Git commit 322fd2d54e4226f6dbb4fb357a86931a5c790340 by Sune Vuorela, on behalf of Wendi Gan.
Committed on 24/05/2024 at 10:02.
Pushed by sune into branch 'master'.

fix Unicode Normalization: replace NFKC to NFC

Use NFC in copy, makeWord, and export functions, and NFKC for search operations.
NFKC may alter characters when copied or exported. For example ⑥ in pdf will be pasted as 6. So most instances are replaced with NFC.
To simplify matching during search operation, NFKC is used.
Related: bug 473495

M  +12   -9    core/textpage.cpp
M  +1    -1    generators/poppler/generator_pdf.cpp

https://invent.kde.org/graphics/okular/-/commit/322fd2d54e4226f6dbb4fb357a86931a5c790340