Bug 406014 - Document metadata in Russian is unreadable (bad encoding?)
Summary: Document metadata in Russian is unreadable (bad encoding?)
Status: RESOLVED FIXED
Alias: None
Product: calligrawords
Classification: Applications
Component: general (show other bugs)
Version: 3.1.0
Platform: Fedora RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: Calligra Words Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-29 19:58 UTC by Alexander Potashev
Modified: 2021-02-21 00:46 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
document to reproduce the bug (268.00 KB, application/msword)
2019-03-29 19:58 UTC, Alexander Potashev
Details
screenshot (90.85 KB, image/png)
2019-03-29 19:59 UTC, Alexander Potashev
Details
title of the same document as seen in LibreOffice (31.68 KB, image/png)
2019-03-29 20:03 UTC, Alexander Potashev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Potashev 2019-03-29 19:58:09 UTC
Created attachment 119136 [details]
document to reproduce the bug

SUMMARY
Document metadata in Russian is unreadable (bad encoding?)

STEPS TO REPRODUCE
1. Open the attached file in calligrawords.
2. Check window title (refer screenshot).
3. Check document properties (refer screenshot).

OBSERVED RESULT
Normal text for document title and author.

EXPECTED RESULT
Unreadable text in place of document title and author.

SOFTWARE/OS VERSIONS
Операционная система: Fedora 29
Версия KDE Plasma: 5.14.5
Версия Qt: 5.11.3
Версия KDE Frameworks: 5.55.0
Версия ядра: 4.20.4-200.fc29.x86_64
Архитектура: 64-битная
Процессоры: 8 × Intel® Core™ i7-6700HQ CPU @ 2.60GHz
Память: 15,4 ГиБ ОЗУ
Comment 1 Alexander Potashev 2019-03-29 19:59:08 UTC
Created attachment 119137 [details]
screenshot
Comment 2 Alexander Potashev 2019-03-29 20:03:19 UTC
Created attachment 119138 [details]
title of the same document as seen in LibreOffice
Comment 3 Bug Janitor Service 2021-02-13 09:17:30 UTC
A possibly relevant merge request was started @ https://invent.kde.org/office/calligra/-/merge_requests/15
Comment 4 Pierre Ducroquet 2021-02-13 13:56:58 UTC
Git commit be82faae699790e8b5d4f68a2e9e2663ff40477e by Pierre Ducroquet.
Committed on 13/02/2021 at 13:56.
Pushed by ducroquet into branch 'master'.

Support more than UTF-8/16 in word metadata import

The meta-data import code was only considering two possible codepages:
UTF-8 and UTF-16.
Word documents tend to use local encoding, so while this behaviour seemed
flawless with US/UK documents, completely different encodings were broken.
Instead of considering only UTF-8 and UTF-16, use QTextCodec and try to
handle as many encoding as possible that way, warning if they are not
found.

See https://bugs.kde.org/show_bug.cgi?id=406014 for example document

M  +13   -8    filters/words/msword-odf/document.cpp

https://invent.kde.org/office/calligra/commit/be82faae699790e8b5d4f68a2e9e2663ff40477e
Comment 5 Pierre Ducroquet 2021-02-13 13:58:43 UTC
Thank you very much for your report Alexander!
Comment 6 Alexander Potashev 2021-02-21 00:46:45 UTC
Thanks for fixing!