Bug 406014

Summary: Document metadata in Russian is unreadable (bad encoding?)
Product: [Applications] calligrawords Reporter: Alexander Potashev <aspotashev>
Component: generalAssignee: Calligra Words Bugs <calligra-words-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: pinaraf
Priority: NOR    
Version: 3.1.0   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: document to reproduce the bug
screenshot
title of the same document as seen in LibreOffice

Description Alexander Potashev 2019-03-29 19:58:09 UTC
Created attachment 119136 [details]
document to reproduce the bug

SUMMARY
Document metadata in Russian is unreadable (bad encoding?)

STEPS TO REPRODUCE
1. Open the attached file in calligrawords.
2. Check window title (refer screenshot).
3. Check document properties (refer screenshot).

OBSERVED RESULT
Normal text for document title and author.

EXPECTED RESULT
Unreadable text in place of document title and author.

SOFTWARE/OS VERSIONS
Операционная система: Fedora 29
Версия KDE Plasma: 5.14.5
Версия Qt: 5.11.3
Версия KDE Frameworks: 5.55.0
Версия ядра: 4.20.4-200.fc29.x86_64
Архитектура: 64-битная
Процессоры: 8 × Intel® Core™ i7-6700HQ CPU @ 2.60GHz
Память: 15,4 ГиБ ОЗУ
Comment 1 Alexander Potashev 2019-03-29 19:59:08 UTC
Created attachment 119137 [details]
screenshot
Comment 2 Alexander Potashev 2019-03-29 20:03:19 UTC
Created attachment 119138 [details]
title of the same document as seen in LibreOffice
Comment 3 Bug Janitor Service 2021-02-13 09:17:30 UTC
A possibly relevant merge request was started @ https://invent.kde.org/office/calligra/-/merge_requests/15
Comment 4 Pierre Ducroquet 2021-02-13 13:56:58 UTC
Git commit be82faae699790e8b5d4f68a2e9e2663ff40477e by Pierre Ducroquet.
Committed on 13/02/2021 at 13:56.
Pushed by ducroquet into branch 'master'.

Support more than UTF-8/16 in word metadata import

The meta-data import code was only considering two possible codepages:
UTF-8 and UTF-16.
Word documents tend to use local encoding, so while this behaviour seemed
flawless with US/UK documents, completely different encodings were broken.
Instead of considering only UTF-8 and UTF-16, use QTextCodec and try to
handle as many encoding as possible that way, warning if they are not
found.

See https://bugs.kde.org/show_bug.cgi?id=406014 for example document

M  +13   -8    filters/words/msword-odf/document.cpp

https://invent.kde.org/office/calligra/commit/be82faae699790e8b5d4f68a2e9e2663ff40477e
Comment 5 Pierre Ducroquet 2021-02-13 13:58:43 UTC
Thank you very much for your report Alexander!
Comment 6 Alexander Potashev 2021-02-21 00:46:45 UTC
Thanks for fixing!