Bug 449460

Summary: Under certain locales, attempt to paste Unicode text ends up as mojibake
Product: [I don't know] kde Reporter: spamless.9v5xj
Component: generalAssignee: Unassigned bugs mailing-list <unassigned-bugs>
Status: RESOLVED UPSTREAM    
Severity: normal CC: nate
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Manjaro   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Demo video

Description spamless.9v5xj 2022-02-01 16:01:53 UTC
SUMMARY
***
When Formats - System Settings is set to certain regions, attempting to copy formatted Unicode text (eg. bold, italics, a hyperlink) and pasting it into LibreOffice as unformatted text will cause the text to end up being garbled gibberish.

***


STEPS TO REPRODUCE
1.  Under Formats - System Settings, set region to "Belgium - English (en_BE)". (Other regions also cause this issue but for demonstration purposes I am using en_BE.)
2. Log out and log back in so changes would take effect
3.  Copy some formatted Unicode text (I went on ja.wikipedia.org and copied the header)
4.  Open LibreOffice writer and trigger context menu. Go to Paste Special -> Unformatted Text and select that option.

OBSERVED RESULT
Text ends up as a nonsensical string of letters

EXPECTED RESULT
Text is rendered properly

ADDITIONAL INFORMATION
It appears this is because the text encoding is incorrectly set to ISO-8859 rather than UTF-8, as the garbled string can be produced identically by pasting the same string into Kate and changing the encoding to ISO-8859.

This is not an issue with locales such as en_US and en_GB and thus indicates behavior is unintended or at least fixable, hence filing this bug.

Issue seems to only affect LibreOffice, however as the bug is triggered by changing region in Plasma settings evidence points to this being on KDE's end.
Comment 1 spamless.9v5xj 2022-02-01 16:06:02 UTC
Created attachment 146128 [details]
Demo video
Comment 2 spamless.9v5xj 2022-02-01 16:19:33 UTC
Quick clarification: This actually happens with attempting to paste any string of unicode
Comment 3 spamless.9v5xj 2022-02-01 16:21:24 UTC
Quick clarification: This happens when attempting to copy any string of Unicode text, it's simply that only when the text contains formatting might one wish to "Paste unformatted". Alas, it seems encoding info gets discarded along with it.
Comment 4 Nate Graham 2022-02-01 23:17:24 UTC
What languages do you have set in the Languages page, and what order are they in?
Comment 5 spamless.9v5xj 2022-02-01 23:21:25 UTC
(In reply to Nate Graham from comment #4)
> What languages do you have set in the Languages page, and what order are
> they in?

Only one - American English.
Comment 6 Nate Graham 2022-02-02 19:11:55 UTC
Can you paste the contents of ~/.config/plasma-localerc?
Comment 7 spamless.9v5xj 2022-02-02 19:15:10 UTC
(In reply to Nate Graham from comment #6)
> Can you paste the contents of ~/.config/plasma-localerc?

[Formats]
LANG=en_BE.UTF-8
LC_MEASUREMENT=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
useDetailed=true
Comment 8 Nate Graham 2022-02-02 19:37:16 UTC
This has to be a problem in LibreOffice or deeper in the stack, then. All we do is set those variables; we don't play with encodings of anything.

Can you report this to the LibreOffice folks at https://bugs.documentfoundation.org? Thanks!
Comment 9 spamless.9v5xj 2022-02-02 19:42:31 UTC
(In reply to Nate Graham from comment #8)
> This has to be a problem in LibreOffice or deeper in the stack, then. All we
> do is set those variables; we don't play with encodings of anything.
> 
> Can you report this to the LibreOffice folks at
> https://bugs.documentfoundation.org? Thanks!

Alright, will do. Thanks for the help!