Bug 208121 - incorrect space characters in pdf form
Summary: incorrect space characters in pdf form
Status: CONFIRMED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 0.13.2
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
: 271775 285438 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-09-21 21:52 UTC by net_life
Modified: 2021-03-10 22:09 UTC (History)
11 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
adobe sample form (89.00 KB, application/pdf)
2010-02-06 12:47 UTC, rpansky
Details
adobe sample form filled in (89.74 KB, application/pdf)
2010-02-06 12:59 UTC, rpansky
Details
weired squares in form (155.56 KB, image/png)
2010-02-14 18:28 UTC, net_life
Details
Visa application for China (217.28 KB, application/pdf)
2014-10-30 15:14 UTC, eric
Details

Note You need to log in before you can comment on or make changes to this bug.
Description net_life 2009-09-21 21:52:16 UTC
Version:            (using KDE 4.3.1)
Compiler:          gcc i686-pc-linux-gnu-4.3.2 
OS:                Linux
Installed from:    Gentoo Packages

incorrect space characters in pdf form.
Example document: http://www.fms.gov.ru/documents/passport/zp25.pdf 
This document must be filled and submitted to local authorities in Russia, in order to be able to travel abroad. 

1)open this document, fill any line in form with "kde user" (space required).

2)click print preview for this document, line will be look like "kde square user".

3)desired behavior is no square, just space.
Comment 1 net_life 2009-09-22 12:34:19 UTC
If one open the filled pdf file from report in the other pdf viewer (acrobat reader, for example) or print it, one will see that spaces are substituted by squares. This behavior is not desired.
Comment 2 Albert Astals Cid 2009-11-05 00:29:39 UTC
does filling that for in Acrobat Reader work? If i open it with Acrobat Reader and type kde user i end up with a lot of squares on screen
Comment 3 net_life 2009-11-06 20:37:50 UTC
I was able to fill this document correctly with acrobat reader 9 on windows.

(In reply to comment #2)
> does filling that for in Acrobat Reader work? If i open it with Acrobat Reader
> and type kde user i end up with a lot of squares on screen
Comment 4 rpansky 2010-02-06 12:47:35 UTC
Created attachment 40563 [details]
adobe sample form
Comment 5 rpansky 2010-02-06 12:48:28 UTC
I confirm that Adobe Acrobat 9.0.0 for Windows fills the form in properly. The resulting file is rendered by the print preview (okular v. 0.9.3 using kde 4.3.3) with spaces not squares.

However, there is some evidence that the document in question is invalid. Consider an official Adobe sample http://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
Okular works fine for it (see the attachment above).
Comment 6 rpansky 2010-02-06 12:59:41 UTC
Created attachment 40566 [details]
adobe sample form filled in

sorry for posting a wrong file above
Comment 7 rpansky 2010-02-06 13:56:45 UTC
There is quite a weird thing with the attachment from #6. I has removed Okular's configuration files (from ~/.kde4/share/apps/okular) and downloaded the attachment.
1. The file is rendered, previewed and printed by Okular with all the fields being empty. However the file does contain the string "kde user".
2. Acrobat 9.0.0 for Windows and Adobe Reader 9.2 for Linux do not show the file with the message: "There was a problem reading this document (14)".
3. If I convert the file into PS, the result has the field filled in.

So I understand nothing here... Is it worth opening a new bug?
Comment 8 net_life 2010-02-14 18:28:45 UTC
Created attachment 40771 [details]
weired squares in form
Comment 9 net_life 2010-02-14 18:36:03 UTC
I was able to open document 
>Example document: http://www.fms.gov.ru/documents/passport/zp25.pdf 
fill the form, and okular shows squares on print preview (see attached image). I was able to fill this form with adobe reader on windows machine.

The example document from addobe http://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
works fine. 

This bug is important because one, who use linux, could not submit offical document electronically, as this is required by Russian government. Same thing may apply to citizens of other countries, who try to communicate with their government electronically.

(In reply to comment #7)
> There is quite a weird thing with the attachment from #6. I has removed
> Okular's configuration files (from ~/.kde4/share/apps/okular) and downloaded
> the attachment.
> 1. The file is rendered, previewed and printed by Okular with all the fields
> being empty. However the file does contain the string "kde user".
> 2. Acrobat 9.0.0 for Windows and Adobe Reader 9.2 for Linux do not show the
> file with the message: "There was a problem reading this document (14)".
> 3. If I convert the file into PS, the result has the field filled in.
> 
> So I understand nothing here... Is it worth opening a new bug?
Comment 10 Dmitry 2011-04-26 16:37:32 UTC
*** Bug 271775 has been marked as a duplicate of this bug. ***
Comment 11 Vladimir Mityukov 2011-08-18 15:30:37 UTC
I'd like to ask the reporter concerning cyrillic characters.. When I fill them into the form, and then click "Done" or "Print preview" -- I get spaces instead of non-latin letters.

Did you have the same problem?
Comment 12 pioner14 2011-08-18 17:51:53 UTC
Yes i have same problem.
18.08.2011 19:30 пользователь "Vladimir Mityukov" <mityukov@gmail.com>
написал:
> https://bugs.kde.org/show_bug.cgi?id=208121
>
>
> Vladimir Mityukov <mityukov@gmail.com> changed:
>
> What |Removed |Added
>
----------------------------------------------------------------------------
> CC| |mityukov@gmail.com
>
>
>
>
> --- Comment #11 from Vladimir Mityukov <mityukov gmail com> 2011-08-18
15:30:37 ---
> I'd like to ask the reporter concerning cyrillic characters.. When I fill
them
> into the form, and then click "Done" or "Print preview" -- I get spaces
instead
> of non-latin letters.
>
> Did you have the same problem?
>
> --
> Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are a voter for the bug.
Comment 13 net_life 2011-08-31 14:25:12 UTC
(In reply to comment #11)
> I'd like to ask the reporter concerning cyrillic characters.. When I fill them
> into the form, and then click "Done" or "Print preview" -- I get spaces instead
> of non-latin letters.
> 
> Did you have the same problem?

Cyrillic characters are rendered properly, but white spaces not.
Comment 14 Davor Cubranic 2011-11-01 21:27:08 UTC
This could be related to bug 285438: cyrillic letters are rendered incorrectly in filled-in forms after "hide forms" is selected.
Comment 15 jordonwii 2011-12-23 16:30:27 UTC
I can reproduce on Okular 0.13.3, KDE 4.7.3. I'm curious as well if this is related to bug 28538? The OP in that one does explicitly say whitespace is rendered correctly, however.
Comment 16 Myriam Schweingruber 2011-12-27 17:31:21 UTC
Changing status.
Comment 17 Myriam Schweingruber 2011-12-27 17:32:43 UTC
*** Bug 285438 has been marked as a duplicate of this bug. ***
Comment 18 vasiliy.korchagin 2012-02-06 08:17:26 UTC
In KDE 4.8, Okular 0.14.0 this problem remains.
Comment 19 atayoohoo 2012-04-08 10:58:16 UTC
The space rendering problem has nothing to do with cyrillic characters.
The document has a ToUnicode mapping that converts a 0x00 to a 0x20 space. Dump of the first beginfbrange section in zp25.pdf (CharCodeToUnicode.cc:353), font is TimesNewRomanCyr:
\d<00> <00> <0020> \d<41> <41> <0041> \d<42> <42> <0042> …………

When filling the form, the conversion is applied the other way round, 0x20 → 0x00, from Unicode to CID.
SplashOutputDev and CairoOutputDev do not show 0x00, but PSOutputDev maps 0x00 to a /.notdef character, which GhostScript draws as a rectangle.
So Poppler's behavior is actually correct.
However, Adobe Reader 9.4.2 also has problems with the map: On Linux/Gtk, every second character in the form is drawn as a rectangle, because the first unicode byte of a character is 0x00 in the lower ranges.
In the pdf specification examples, the conversion from 0x20 to 0x00 is used explicitely. This bug seems to be a WONTFIX.
Comment 20 net_life 2012-04-09 22:59:55 UTC
Dear atayoohoo@googlemail.com,

From user point of view inconsistent behavior (no squares in pdf view, but squares on print) is definitely a bug, not a feature. Is it possible to make okular rendering on screen and rendering for printing consistent (both with squares or both without)?

I am not a developer, but from my point of view printing hi-res raster image version of pdf can solve this bug and all others like this. Moreover okular printing dialog has a flag to force rasterisation (print -> options -> PDF options -> Force rasterisation). This flag can be very useful for the cases like this one.
Comment 21 atayoohoo 2012-04-10 17:13:52 UTC
>> From user point of view inconsistent behavior (no squares in pdf view, but squares on print)
>> is definitely a bug, not a feature. Is it possible to make okular rendering on screen
>> and rendering for printing consistent (both with squares or both without)?
How to render the notdef glyph is not that clear.

To get a space, replace "./notdef 0 def" by "./notdef 3 def" in line 686 of the postscript output.

Quotations from http://bugs.ghostscript.com/show_bug.cgi?id=690935 (the same problem like here).
[Comment 9]
The default behaviour for Ghostscript is to render TrueType /.notdef glyphs when
the input is PostScript, and *not* to render TrueType /.notdef glyphs when the
input is PDF. Hence why this works when you run the original PostScript, but
doesn't work when you run the PDF file.

We know from the original work on this issue (see bug #689757) that the rules
Acrobat uses on whether to render a /.notdef or not are incomprehensible. In
particular we know that making a font symbolic does not force display. I mention
this because I had thought that the fact that the font was symbolic was why
Acrobat displayed this one.

[Comment 10]
In PostScript we always render the /.notdef glyph, because that's the way the
specification is written and mostly everyone sticks to the spec. In PDF,
however, although the spec is written so that the /.notdef glyph should be
rendered, Adobe Acrobat 'sometimes' (and I haven't been able to work out a rule
for this) doesn't render the /.notdef but instead leaves a gap equivalent to its
width.
This leads to complaints about 'hollow squares' or 'boxes'. Of course these are
technically correctly rendered, but Acrobat doesn't display them so we are seen
as incorrect.

[PDF Reference, Third Edition]
All Type 1 font programs contain an actual glyph for the character named
.notdef. The effect produced by showing the .notdef character is at the discretion
of the font designer; in Type 1 font programs produced by Adobe, it is the same
as the space character. If an encoding maps to a character name that does not ex-
ist in the Type 1 font program, the .notdef character is substituted.

[PDF/A-2 http://www.pdfa.org/2011/08/pdfa-%E2%80%93-a-look-at-the-technical-side/]
PDF/A-2 will also apply stricter rules for glyphs in embedded fonts. PDF/A-2 will not allow the
use of .notdef glyphs and will only permit so-called ‘empty’ glyphs for white space.

GhostScript seems to use the Type 1 .notdef glyph. When you extract TimeNewRomanCyr,
open it with fontforge and convert to CID using Adobe-Identity, you see that the .notdef character
is a rectangle.

I see those possibilities:
1) Render replacement glyphs in both PDF and PS. Looks bad, but is quite standard compliant. However, many users might complain and
   the rasterization workaround will not work.
2) Add an option to replace .notdef by a space in both PS/PDF rendering. The rendering could match every PDF spec (normal PDF and PDF/A),
   which is important, but it is additionally possible to get an acceptable ouput for printing.
3) Maybe even check the pdf version for normal PDF or PDF/A.
Comment 22 net_life 2012-04-12 20:01:29 UTC
(In reply to comment #21)
> 2) Add an option to replace .notdef by a space in both PS/PDF rendering. The
> rendering could match every PDF spec (normal PDF and PDF/A),
>    which is important, but it is additionally possible to get an acceptable
> ouput for printing.
This is the best solution, since most part of important pdf documents (e.g. from government, company) are provided in only one version.
Comment 23 Andre 2013-02-25 19:50:05 UTC
Confirm this bug with KDE 4.10 Okular 0.16.0.
Comment 24 Egor 2014-01-13 09:27:42 UTC
Hits me too.
KDE SC 4.11.2
Okular 0.17.2

Forced rasterization is a viable workaround but produces low-quality image.
Comment 25 Egor 2014-01-13 09:31:51 UTC
(In reply to comment #24)
> Hits me too.
> KDE SC 4.11.2
> Okular 0.17.2
> 
> Forced rasterization is a viable workaround but produces low-quality image.

Ah, no, the quality is also fine. The glitches were just on the screen, where the page was scaled to fit width, 1.5x of its real size. Printed page is great, thanks for the option «Force rasterization»
Comment 26 Alexander Potashev 2014-04-28 20:29:01 UTC
I confirm this in KDE SC 4.12.4.
Comment 27 eric 2014-10-30 15:14:42 UTC
Created attachment 89383 [details]
Visa application for China

The pdf (visa application for China) contains fillable forms.
When the forms are filled, the input is converted to weird characters.
I have this problem with okular-0.20.0 (kde-4.14.1).
The Linux-version of Acroread-9.5.5 is working as expected.
Comment 28 Justin Zobel 2021-03-09 23:59:20 UTC
Thank you for the bug report.

As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists.

If this bug is no longer persisting or relevant please change the status to resolved.
Comment 29 eric 2021-03-10 22:09:22 UTC
(In reply to Justin Zobel from comment #28)
> If this bug is no longer persisting or relevant please change the status to
> resolved.

The problem mentioned in comment #27 has not disappeared.
Currently I'm using: okular 1.11.1, kde frameworks 5.74.0 and Qt 5.15.1.