Bug 141059 - Manpages with unicode letters are not converted properly
Summary: Manpages with unicode letters are not converted properly
Status: RESOLVED FIXED
Alias: None
Product: docs.kde.org
Classification: Websites
Component: ksgmltools (show other bugs)
Version: 3.5
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Docbook Team
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-02 11:53 UTC by Krzysztof Lichota
Modified: 2017-04-02 13:38 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Krzysztof Lichota 2007-02-02 11:53:09 UTC
Version:            (using KDE KDE 3.5.5)
Installed from:    Ubuntu Packages

When I generate manpages from .po files, created manpages contain Polish letters in the form of &#number; instead of proper unicode characters.

Example (generated from stable/l10n/pl/docs/kdeedu/keduca/man-keduca.1.docbook):
KEDUCA(1)                                  Podręcznik użytkownika KDE                                 KEDUCA(1)

NAME
       keduca - Program do przeprowadzanie testów i egzaminów

SYNOPSIS
       keduca [plik] [Standardowe opcje KDE] [Standardowe opcje Qt]

OPIS
       KEduca  jest  przydatnym programem dzięki któremu można przeprowadzać testy w formie interaktywnych
       formularzy.

OPCJE
       plik   Nazwa pliku do załadowania

ZOBACZ TAKżE
       Szczegółowa dokumentacja jest dostępna pod adresem help:/keduca (należy go wprowadzić jako URL
       w programie Konqueror, albo wykonać polecenie: khelpcenter help:/keduca).

       Więcej    informacji    na   temat   programu   dostępne   jest   na   stronach   środowiska   KDE:
       http://edu.kde.org/keduca/

AUTORZY
       Autorem programu KEduca jest Javier Campos

K Desktop Environment                                 16 marzec 2005                                            KEDUCA(1)
Comment 1 Philip Rodrigues 2007-02-03 00:20:24 UTC
I think this is a fairly common problem with XSLT and non-XML target formats - there's a similar problem in the DocBook->PDF stuff I've been working on. I don't know of a proper fix, although perhaps there's an xsltproc option somewhere
Comment 2 Shaun McCance 2007-02-03 01:14:20 UTC
If it's an XSLT issue, make sure you have the following in your XSLT:

<xsl:output method="text" encoding="UTF-8"/>

If you get numeric entities with that xsl:output declaration, there's a bug in your XSLT processor.
Comment 3 Philip Rodrigues 2007-02-03 19:01:22 UTC
Ah, thanks Shaun. I'll look into that.
Comment 4 Marek Laane 2007-08-01 23:33:11 UTC
And the result of looking into is?
Comment 5 Michael(tm) Smith 2007-08-02 02:50:54 UTC
Note that the way we deal with this in the DocBook Project XSL stylesheets is to use a character map to convert the Unicode characters to whatever equivalent roff escapes they might have.

https://docbook.svn.sourceforge.net/svnroot/docbook/trunk/xsl/manpages/charmap.groff.xsl

That file contains mapping for more than 800 Unicode characters.
Comment 6 Krzysztof Lichota 2007-08-02 15:43:23 UTC
Yeah, but it doesn't contain, as far as I can see, ISO-8859-2 and, in turn, Polish letters (for example s-acute).
Can you provide some instructions how to create such mapping for ISO-8859-2? I would like to add this mapping and commit it, so that in KDE 3.5.8 the problem is solved.
Comment 7 Michael(tm) Smith 2007-08-30 13:23:38 UTC
> Yeah, but it doesn't contain, as far as I can see, ISO-8859-2 and,
> in turn, Polish letters (for example s-acute).

I don't think there are any groff escapes for those characters.
I'm not sure but I think the way to get those characters is to
author/encode your source in ISO-8859-2, and set the stylesheet
output encoding to ISO-8859-2 also, which will generate a man
page that is viewable by users who have their locale set to
ISO-8859-2.
 
> Can you provide some instructions how to create such mapping for
> ISO-8859-2? I would like to add this mapping and commit it, so
> that in KDE 3.5.8 the problem is solved.

There is no way to create such a mapping if groff does not
already provide groff escapes for those characters.

Information about the available groff escapes is in the
groff_char(7) man page, so if you do "man 7 groff_char" you can
look through the list there. I don't know if that list is
complete, but if it's not, I'm not sure where else to look.
Comment 8 Luigi Toscano 2011-03-02 21:06:32 UTC
Can you please check if this bug is still valid for KDE SC >=4.6.x? Manpage generation now relies on upstream stylesheets, so the bug should be (hopefully) fixed.
Comment 9 Burkhard Lück 2011-06-29 06:53:37 UTC
Checked with master/4.7 compiled from sources, bug is no longer valid.
Comment 10 Luigi Toscano 2017-04-02 13:38:56 UTC
NEEDINFO for 7 years, I suspect it has been really fixed for a while.