SUMMARY When copying from Okular, all characters will be completely decomposed, i.e. both canonical decomposition and compatibility decomposition will be applied (see the Unicode Standard). While canonical decompostion is fine at most time, compatibility decomposition is not always desired, since some formatting information will be lost. It is especially a problem for punctuations in Chinese, because we almost always use fullwidth characters, but they are defined as compatibility decomposable to their ASCII counterparts, which are almost never used (unless when mixed with Latin scripts, and those ASCII counterparts are used exclusively for them). STEPS TO REPRODUCE 1. Create any text file with the content "你好,世界!" 2. Open it with Okular 3. Copy the content OBSERVED RESULT The copied result is "你好,世界!", where "," (U+FF0C) is turned into "," (U+U+002C) and "!" (U+FF01) is turned into "!" (U+0021). EXPECTED RESULT We should have an option to turn off compatibility decomposition (or canonical decomposition, just in case) and the content should be copied as-is. SOFTWARE/OS VERSIONS Linux: Ubuntu 23.04 KDE Plasma Version: 5.27.4 KDE Frameworks Version: 5.104.0 Qt Version: 5.15.8 ADDITIONAL INFORMATION The version provided by Ubuntu may be a little old, but that shouldn't matter.
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/941