Summary: | Recode man output into UTF-8 encoding in kio_man | ||
---|---|---|---|
Product: | [Unmaintained] kio | Reporter: | Sergey A. Sukiyazov <sukiyazov> |
Component: | man | Assignee: | David Faure <faure> |
Status: | RESOLVED FIXED | ||
Severity: | wishlist | CC: | eshkrig, estellnb, faure, kollix, lueck, thiago, torre_cremata |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Gentoo Packages | ||
OS: | Linux | ||
Latest Commit: | http://commits.kde.org/kde-runtime/0e245f152f276ed8e254769e327b00c3b4c534fd | Version Fixed In: | 4.9.1 |
Sentry Crash Report: | |||
Attachments: |
Recode man contents in kio_man
Forces kioproc to use codec for current locale instead of ISO-8859-1 Corrects UTF-8 charset name to charset name current locale in Content-Type builded HTML document Patch to document changes in kio_man.cpp |
Description
Sergey A. Sukiyazov
2007-02-07 17:15:13 UTC
Created attachment 19580 [details]
Recode man contents in kio_man
Recode manpage contents in kio_mad. Manpage source encoding determined by
.charset file ini man directory (i.e. /usr/share/man/ru/.charset) or or value
MAN_ICONV_INPUT_CHARSET environment variable
Created attachment 19581 [details]
Forces kioproc to use codec for current locale instead of ISO-8859-1
Forces kioproc to use codec for current locale instead of ISO-8859-1. In
kdecore/kprocio.cpp (at line near 30) hardly used QTextCodec for ISO-8859-1
encoding, change it for locale speciefic charset and use this codec (at line
near 235) to convert input data to unicode
Created attachment 19582 [details]
Corrects UTF-8 charset name to charset name current locale in Content-Type builded HTML document
In kdelibs/kdoctools/kio_help.cpp (at line near 134 and 348) replace
QTextCodec::name() by QTextCodec::mimeName() for correct Content-Type. For
example, in ru_RU.CP1251 locale, QTextCodec::name() returns "CP 1251", but
Content-Type must be "...;charset=windows-1251"
Duplicate of Gentoo bug http://bugs.gentoo.org/show_bug.cgi?id=152546 Patches names in order: kdebase-3.5.0-man_recode.patch kdelibs-3.1.2-fix-kprocio-def-codec.patch kdelibs-3.4.0-ALT-fix-kdoctools-mime-charset.patch Very impotant feature! We need this! all of 3 patches works fine in ALT Linux It's very useful thing for many linux users. Please fix it! Created attachment 19638 [details] Patch to document changes in kio_man.cpp Small documentation update to reflect changes done by this patches. BTW. This patches are applied in Altlinux Sisyphus (unstable branch of Altlinux): http://sisyphus.ru/srpm/kdelibs/patches * kdelibs-3.4.0-ALT-fix-kdoctools-mime-charset.patch * kdelibs-3.1.2-fix-kprocio-def-codec.patch and http://sisyphus.ru/srpm/kdebase/patches * kdebase-3.5.0-man_recode.patch I've tested them and these patches work here. Really useful :) Thank you, Sergey. Anybody from KDE team read this page? This bug was created one month ago, but no reaction for this improvement... All users, which write/read with alphabet, other than Latin, very need this. Please do something with this emprovement. I have read this. There are bugs that have been open for far longer than a month. Ok. It's good news :-) I was sure that bug are abandoned... But,I am compelled to apply this patches since 3.1.2 version of KDE. In ALTLinux (russian native-adapted distributive) this improvement used longer that a several years. This bug is not only information about existing problem. I also propose ready solution of this problem. For this improvement I attach three patches. Every patch have small lot of code. If variable MAN_ICONV_INPUT_CHARSET not set, and .charmap file does not exist, behavior of kio_man not change. Simply look it and try use. It will not take away a lot of Your time... If You have another thinks about this problem (and proposed solution) write me. I can rebuild this solution for more be adapted to KDE.... I just fixed an encoding bug in kio_man for kde-4.3 so I found this report. Why recode the data, when you can just export the right charset in the meta http-equiv tag? That's what my fix does. I never heard of $MAN_ICONV_INPUT_CHARSET though, I just made it use the system locale. Please test trunk or 4.3 when it's out, and report what still doesn't work. with recent trunk compiled from sources I still need to select manually ISO 8859-1 to see the german "Umlaute" properly in the german translations of the kde man pages Checked again with master and 4.7 compiled from sources, the german "Umlaute" are properly displayed in the german translations of the kde man pages Here on openSuse 12.1, having installed man-pages-de-0.5-1.1.noarch, I still can reproduce the wrong encoding problem. It is clear why this does not work: my locale is de_AT.utf8, but the man pages are encoded in ISO-8859-1, but kio_man can not know this and thinks the man page file content is utf8. Your patch uses a .charset file which defines the encoding. openSuse has no such files installed, so I wonder if this is a distribution specific file. Also, the mentioned env var MAN_ICONV_INPUT_CHARSET - is this distribution specific, is this some standard used also for something else, or did you invent it especially for this patch ? *** Bug 140495 has been marked as a duplicate of this bug. *** *** Bug 277466 has been marked as a duplicate of this bug. *** Git commit cafaf92ff1c57c1b3d8bf2a8f371099d054caa96 by Martin Koller. Committed on 14/08/2012 at 23:55. Pushed by mkoller into branch 'KDE/4.9'. auto-detect encoding of man page source as man page files do not define in which encoding they are written, the man-db source of the man commandline tools has a hardcoded list of languages and their encodings (see its encodings.c file) As this might change in the future (e.g. distributions move to UTF-8) the hardcoded values might be wrong. Therefore I now use KDEs encoding auto-detection mechanism and convert the pages source always to UTF-8 before processing. FIXED-IN: 4.9.1 M +52 -45 kioslave/man/kio_man.cpp M +1 -8 kioslave/man/man2html.cpp http://commits.kde.org/kde-runtime/cafaf92ff1c57c1b3d8bf2a8f371099d054caa96 Git commit 0e245f152f276ed8e254769e327b00c3b4c534fd by Martin Koller. Committed on 14/08/2012 at 23:55. Pushed by mkoller into branch 'master'. auto-detect encoding of man page source as man page files do not define in which encoding they are written, the man-db source of the man commandline tools has a hardcoded list of languages and their encodings (see its encodings.c file) As this might change in the future (e.g. distributions move to UTF-8) the hardcoded values might be wrong. Therefore I now use KDEs encoding auto-detection mechanism and convert the pages source always to UTF-8 before processing. FIXED-IN: 4.9.1 M +52 -45 kioslave/man/kio_man.cpp M +1 -8 kioslave/man/man2html.cpp http://commits.kde.org/kde-runtime/0e245f152f276ed8e254769e327b00c3b4c534fd |