Bug 141340

Summary: Recode man output into UTF-8 encoding in kio_man
Product: [Unmaintained] kio Reporter: Sergey A. Sukiyazov <sukiyazov>
Component: manAssignee: David Faure <faure>
Status: RESOLVED FIXED    
Severity: wishlist CC: eshkrig, estellnb, faure, kollix, lueck, thiago, torre_cremata
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In: 4.9.1
Sentry Crash Report:
Attachments: Recode man contents in kio_man
Forces kioproc to use codec for current locale instead of ISO-8859-1
Corrects  UTF-8 charset name to  charset name current locale in Content-Type builded HTML document
Patch to document changes in kio_man.cpp

Description Sergey A. Sukiyazov 2007-02-07 17:15:13 UTC
Version:            (using KDE KDE 3.5.6)
Installed from:    Gentoo Packages

Recode man output from current locale encoding (or from .charset file in
manpages directory, or value MAN_ICONV_INPUT_CHARSET environment variable) 
into UTF-8 encoding for correct representation by konqueror

(See also http://bugs.gentoo.org/show_bug.cgi?id=152546)
Comment 1 Sergey A. Sukiyazov 2007-02-07 17:19:18 UTC
Created attachment 19580 [details]
Recode man contents in kio_man

Recode manpage contents in kio_mad. Manpage source encoding determined by
.charset file ini man directory (i.e. /usr/share/man/ru/.charset) or or value
MAN_ICONV_INPUT_CHARSET environment variable
Comment 2 Sergey A. Sukiyazov 2007-02-07 17:24:56 UTC
Created attachment 19581 [details]
Forces kioproc to use codec  for current locale instead of ISO-8859-1

Forces kioproc to use codec  for current locale instead of ISO-8859-1. In
kdecore/kprocio.cpp (at line near 30) hardly used QTextCodec for ISO-8859-1
encoding, change it for locale speciefic charset and use this codec (at line
near 235) to convert input data to unicode
Comment 3 Sergey A. Sukiyazov 2007-02-07 17:30:03 UTC
Created attachment 19582 [details]
Corrects  UTF-8  charset name to  charset name current locale in Content-Type builded HTML document

In kdelibs/kdoctools/kio_help.cpp (at line near 134 and 348) replace
QTextCodec::name() by QTextCodec::mimeName() for correct Content-Type. For
example, in ru_RU.CP1251 locale, QTextCodec::name() returns "CP 1251", but
Content-Type must be "...;charset=windows-1251"
Comment 4 Sergey A. Sukiyazov 2007-02-07 17:33:37 UTC
Duplicate of Gentoo bug http://bugs.gentoo.org/show_bug.cgi?id=152546

Patches names in order:
kdebase-3.5.0-man_recode.patch
kdelibs-3.1.2-fix-kprocio-def-codec.patch
kdelibs-3.4.0-ALT-fix-kdoctools-mime-charset.patch

Comment 5 negorui igor 2007-02-08 16:41:51 UTC
Very impotant feature! We need this!
Comment 6 Sergey V Turchin 2007-02-08 16:44:53 UTC
all of 3 patches works fine in ALT Linux
Comment 7 Mozhaev Grigorij 2007-02-08 18:41:35 UTC
It's very useful thing for many linux users. Please fix it!
Comment 8 Peter Volkov 2007-02-12 10:44:04 UTC
Created attachment 19638 [details]
Patch to document changes in kio_man.cpp

Small documentation update to reflect changes done by this patches.

BTW. This patches are applied in Altlinux Sisyphus (unstable branch of
Altlinux):
http://sisyphus.ru/srpm/kdelibs/patches

* kdelibs-3.4.0-ALT-fix-kdoctools-mime-charset.patch
* kdelibs-3.1.2-fix-kprocio-def-codec.patch

and
http://sisyphus.ru/srpm/kdebase/patches

* kdebase-3.5.0-man_recode.patch

I've tested them and these patches work here. Really useful :) Thank you,
Sergey.
Comment 9 Sergey A. Sukiyazov 2007-03-06 10:17:19 UTC
Anybody from KDE team read this page? This bug was created one month ago, but no reaction for this improvement... 

All users, which write/read with alphabet, other than Latin, very need this. Please do something with this emprovement.
Comment 10 Thiago Macieira 2007-03-06 10:35:08 UTC
I have read this. There are bugs that have been open for far longer than a month.
Comment 11 Sergey A. Sukiyazov 2007-03-06 15:10:58 UTC
Ok. It's good news :-) I was sure that bug are abandoned...

But,I am compelled to apply this patches since 3.1.2 version of KDE. 
In ALTLinux (russian native-adapted distributive) this improvement 
used longer that a several years.

This bug is not only information about existing problem. I also propose 
ready solution of this problem. 

For this improvement I attach three patches. Every patch have small 
lot of code. If variable MAN_ICONV_INPUT_CHARSET not set, and .charmap 
file does not exist, behavior of kio_man not change. 

Simply look it and try use. It will not take away a lot of Your time...

If You have another thinks about this problem (and proposed solution) 
write me. I can rebuild this solution for more be adapted to KDE....
Comment 12 David Faure 2009-04-06 19:19:32 UTC
I just fixed an encoding bug in kio_man for kde-4.3 so I found this report.
Why recode the data, when you can just export the right charset in the meta http-equiv tag? That's what my fix does. I never heard of $MAN_ICONV_INPUT_CHARSET though, I just made it use the system locale.

Please test trunk or 4.3 when it's out, and report what still doesn't work.
Comment 13 Burkhard Lück 2009-06-04 09:54:49 UTC
with recent trunk compiled from sources I still need to select manually ISO 8859-1 to see the german "Umlaute" properly in the german translations of the kde man pages
Comment 14 Burkhard Lück 2011-06-29 06:57:52 UTC
Checked again with master and 4.7 compiled from sources, the german "Umlaute" are properly displayed in the german translations of the kde man pages
Comment 15 Martin Koller 2012-08-13 18:02:34 UTC
Here on openSuse 12.1, having installed man-pages-de-0.5-1.1.noarch, I still can reproduce the wrong encoding problem.
It is clear why this does not work: my locale is de_AT.utf8, but the man pages are encoded in ISO-8859-1, but kio_man can not know this and thinks the man page file content is utf8.

Your patch uses a .charset file which defines the encoding.
openSuse has no such files installed, so I wonder if this is a distribution specific file.
Also, the mentioned env var MAN_ICONV_INPUT_CHARSET - is this distribution specific, is this some standard used also for something else, or did you invent it especially  for this patch ?
Comment 16 Martin Koller 2012-08-13 18:02:47 UTC
*** Bug 140495 has been marked as a duplicate of this bug. ***
Comment 17 Martin Koller 2012-08-13 18:13:44 UTC
*** Bug 277466 has been marked as a duplicate of this bug. ***
Comment 18 Martin Koller 2012-08-14 22:07:24 UTC
Git commit cafaf92ff1c57c1b3d8bf2a8f371099d054caa96 by Martin Koller.
Committed on 14/08/2012 at 23:55.
Pushed by mkoller into branch 'KDE/4.9'.

auto-detect encoding of man page source

as man page files do not define in which encoding they are written,
the man-db source of the man commandline tools has a hardcoded list
of languages and their encodings (see its encodings.c file)
As this might change in the future (e.g. distributions move to UTF-8)
the hardcoded values might be wrong.
Therefore I now use KDEs encoding auto-detection mechanism and convert the pages
source always to UTF-8 before processing.
FIXED-IN: 4.9.1

M  +52   -45   kioslave/man/kio_man.cpp
M  +1    -8    kioslave/man/man2html.cpp

http://commits.kde.org/kde-runtime/cafaf92ff1c57c1b3d8bf2a8f371099d054caa96
Comment 19 Martin Koller 2012-08-14 22:07:27 UTC
Git commit 0e245f152f276ed8e254769e327b00c3b4c534fd by Martin Koller.
Committed on 14/08/2012 at 23:55.
Pushed by mkoller into branch 'master'.

auto-detect encoding of man page source

as man page files do not define in which encoding they are written,
the man-db source of the man commandline tools has a hardcoded list
of languages and their encodings (see its encodings.c file)
As this might change in the future (e.g. distributions move to UTF-8)
the hardcoded values might be wrong.
Therefore I now use KDEs encoding auto-detection mechanism and convert the pages
source always to UTF-8 before processing.
FIXED-IN: 4.9.1

M  +52   -45   kioslave/man/kio_man.cpp
M  +1    -8    kioslave/man/man2html.cpp

http://commits.kde.org/kde-runtime/0e245f152f276ed8e254769e327b00c3b4c534fd