Bug 407127 - po2xml produces broken XML in case: <para> text <itemizedlist/> text <xref/> </para>
Summary: po2xml produces broken XML in case: <para> text <itemizedlist/> text <xref/> ...
Status: REPORTED
Alias: None
Product: docs.kde.org
Classification: Websites
Component: ksgmltools (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Documentation Editorial Team
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-01 14:38 UTC by Eric Bischoff
Modified: 2019-05-01 15:01 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Partially working patch (2.49 KB, patch)
2019-05-01 14:38 UTC, Eric Bischoff
Details
reproducer: full XML file (435 bytes, text/xml)
2019-05-01 14:45 UTC, Eric Bischoff
Details
reproducer: translations (1006 bytes, text/x-gettext-translation)
2019-05-01 14:45 UTC, Eric Bischoff
Details
broken result (697 bytes, text/xml)
2019-05-01 14:46 UTC, Eric Bischoff
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Bischoff 2019-05-01 14:38:28 UTC
Created attachment 119767 [details]
Partially working patch

SUMMARY


STEPS TO REPRODUCE

1. Add the following text to some Docbook file

   <para>
     This is an nice list:<itemizedlist>
      <listitem><para>One</para></listitem>
      <listitem><para>Two</para></listitem>
      <listitem><para>Three</para></listitem>
    </itemizedlist>that ends up here: <xref linkend="somewhere"/></para>

2. Extract the messages with xml2pot
3. Translate English into your language, result is in some po file
4. Regenerate the XML file with po2xml

OBSERVED RESULT

Invalid XML file, with the end remaining in English.

EXPECTED RESULT

Properly translated XML file.

SOFTWARE/OS VERSIONS

Linux/KDE Plasma: Kubuntu 19.04 disco dingo
Qt: 5.12.2
KDE Frameworks: 5.56.0
kf5-config: 1.0


ADDITIONAL INFORMATION

The problem lies in poxml's parser.cpp lines 507 and 598. With that XML text, start_line and end_line are identical for variables msg1 and msg2. The blocks: 

   "This is a nice list:"

and

   "<listitem><para>One</para></listitem>
      <listitem><para>Two</para></listitem>
      <listitem><para>Three</para></listitem>
    </itemizedlist>that ends up here: <xref linkend="somewhere"/>"

(variables msg1 and msg2) are at the same level inside the containing <para>, it seems that this case has been overlooked.

I have a solution, but it only works if the limit between msg1 and msg2 is on the first line (see attachement). The problem I am unable to resolve is how to compute the column and line of the limit between msg1 and msg2 (i.e. strindex characters after the beginning of msg1).
Comment 1 Eric Bischoff 2019-05-01 14:43:56 UTC
poxml Version: 4:18.12.3-0ubuntu1, preferably add my latest patches (see in phabricator).

Adding full files to reproduce problem.
Comment 2 Eric Bischoff 2019-05-01 14:45:15 UTC
Created attachment 119768 [details]
reproducer: full XML file
Comment 3 Eric Bischoff 2019-05-01 14:45:49 UTC
Created attachment 119769 [details]
reproducer: translations
Comment 4 Eric Bischoff 2019-05-01 14:46:15 UTC
Created attachment 119770 [details]
broken result