Bug 406014

Summary:	Document metadata in Russian is unreadable (bad encoding?)
Product:	[Applications] calligrawords	Reporter:	Alexander Potashev <aspotashev>
Component:	general	Assignee:	Calligra Words Bugs <calligra-words-bugs-null>
Status:	RESOLVED FIXED
Severity:	normal	CC:	pinaraf
Priority:	NOR
Version First Reported In:	3.1.0
Target Milestone:	---
Platform:	Fedora RPMs
OS:	Linux
Latest Commit:	https://invent.kde.org/office/calligra/commit/be82faae699790e8b5d4f68a2e9e2663ff40477e	Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	document to reproduce the bug screenshot title of the same document as seen in LibreOffice

Description Alexander Potashev 2019-03-29 19:58:09 UTC

Created attachment 119136 [details]
document to reproduce the bug

SUMMARY
Document metadata in Russian is unreadable (bad encoding?)

STEPS TO REPRODUCE
1. Open the attached file in calligrawords.
2. Check window title (refer screenshot).
3. Check document properties (refer screenshot).

OBSERVED RESULT
Normal text for document title and author.

EXPECTED RESULT
Unreadable text in place of document title and author.

SOFTWARE/OS VERSIONS
Операционная система: Fedora 29
Версия KDE Plasma: 5.14.5
Версия Qt: 5.11.3
Версия KDE Frameworks: 5.55.0
Версия ядра: 4.20.4-200.fc29.x86_64
Архитектура: 64-битная
Процессоры: 8 × Intel® Core™ i7-6700HQ CPU @ 2.60GHz
Память: 15,4 ГиБ ОЗУ

Comment 1 Alexander Potashev 2019-03-29 19:59:08 UTC

Created attachment 119137 [details]
screenshot

Comment 2 Alexander Potashev 2019-03-29 20:03:19 UTC

Created attachment 119138 [details]
title of the same document as seen in LibreOffice

Comment 3 Bug Janitor Service 2021-02-13 09:17:30 UTC

A possibly relevant merge request was started @ https://invent.kde.org/office/calligra/-/merge_requests/15

Comment 4 Pierre Ducroquet 2021-02-13 13:56:58 UTC

Git commit be82faae699790e8b5d4f68a2e9e2663ff40477e by Pierre Ducroquet.
Committed on 13/02/2021 at 13:56.
Pushed by ducroquet into branch 'master'.

Support more than UTF-8/16 in word metadata import

The meta-data import code was only considering two possible codepages:
UTF-8 and UTF-16.
Word documents tend to use local encoding, so while this behaviour seemed
flawless with US/UK documents, completely different encodings were broken.
Instead of considering only UTF-8 and UTF-16, use QTextCodec and try to
handle as many encoding as possible that way, warning if they are not
found.

See https://bugs.kde.org/show_bug.cgi?id=406014 for example document

M  +13   -8    filters/words/msword-odf/document.cpp

https://invent.kde.org/office/calligra/commit/be82faae699790e8b5d4f68a2e9e2663ff40477e

Comment 5 Pierre Ducroquet 2021-02-13 13:58:43 UTC

Thank you very much for your report Alexander!

Comment 6 Alexander Potashev 2021-02-21 00:46:45 UTC

Thanks for fixing!