Bug 331697 - can't fill out pdf form
Summary: can't fill out pdf form
Status: RESOLVED UPSTREAM
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 0.19.60
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL: https://www.bahn.de/p/view/mdb/bahnin...
Keywords: usability
Depends on:
Blocks:
 
Reported: 2014-03-03 10:52 UTC by Steffen Sledz
Modified: 2017-11-30 03:24 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
The file as I had it saved on my HD (80.15 KB, application/pdf)
2017-07-28 20:31 UTC, Albert Astals Cid
Details
given sample file, "cleaned" using "mutool" (241.46 KB, application/pdf)
2017-10-11 12:23 UTC, Michael Weghorn
Details
given sample file with removed "DA" entry for field "Startbahnhof" (241.43 KB, application/pdf)
2017-10-11 12:24 UTC, Michael Weghorn
Details
new version of the form from website of "Deutsche Bahn", works fine with Okular (135.85 KB, application/pdf)
2017-10-11 12:25 UTC, Michael Weghorn
Details
another sample PDF from which shows the same problem, cause a bit different (252.84 KB, application/pdf)
2017-10-11 12:25 UTC, Michael Weghorn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steffen Sledz 2014-03-03 10:52:53 UTC
I tried to fill out pdf form https://www.bahn.de/p/view/mdb/bahnintern/agb/befoerderungsbedingungen/MDB85421-fgr_barrierefrei12.pdf without success.

Only some of the check boxes (e.g. "Ich habe den Anschlusszug verpasst") could be set correctly.

Reproducible: Always

Steps to Reproduce:
1. open the document
2. try to fill out the form
3. try to print the document
Actual Results:  
Only a few inputs are printed.

Expected Results:  
All inputs are printed.
Comment 1 Albert Astals Cid 2014-03-03 18:06:52 UTC
I need to know one that you can't set, the one that you can't set is not a bug so i'm not interested on it :-)
Comment 2 Steffen Sledz 2014-03-04 07:28:25 UTC
Only the
* checkboxes in part 2, 3, and 5
* the 2nd and 3rd fields in "Angekommen bin ich am" in part 2
* the "BLZ/SWIFT-BIC" and "IBAN" fields in part 5
are usable.

*All other* fields (in the red highlighted areas) are empty.
Comment 3 Albert Astals Cid 2014-03-04 19:44:39 UTC
For the ease of going to what is wrong and not having to substract what is good from what is bad i'll write a field that doesn't work

Startbahnhof in page 1

This probably is a bug in poppler, needs investigation though
Comment 4 Michael Weghorn 2017-07-20 11:34:36 UTC
The link in the bug report no longer points to a valid PDF file.
Can the respective PDF form possibly be attached to the bug report?
Without the file, there is no way to reproduce the bug and thus to try fixing it...
Comment 5 Albert Astals Cid 2017-07-28 20:31:11 UTC
Created attachment 106929 [details]
The file as I had it saved on my HD
Comment 6 Michael Weghorn 2017-10-11 12:23:39 UTC
Created attachment 108278 [details]
given sample file, "cleaned" using "mutool"
Comment 7 Michael Weghorn 2017-10-11 12:24:24 UTC
Created attachment 108279 [details]
given sample file with removed "DA" entry for field "Startbahnhof"
Comment 8 Michael Weghorn 2017-10-11 12:25:11 UTC
Created attachment 108281 [details]
new version of the form from website of "Deutsche Bahn", works fine with Okular
Comment 9 Michael Weghorn 2017-10-11 12:25:52 UTC
Created attachment 108282 [details]
another sample PDF from which shows the same problem, cause a bit different
Comment 10 Michael Weghorn 2017-10-11 12:29:55 UTC
I analysed the problem with the attached PDF form. As far as I understand it so far, the root cause is basically an "incorrect" PDF file, not Poppler.

For easier analysis in a text editor, I created a "cleaned" version of the document using the command "mutool clean -d -a bug331697.pdf" which makes binary streams being ASCII encoded and decompresses streams. The resulting PDF document is attached as file "bug331697_MUTOOL_CLEANED.pdf".


The PDF specification (https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf) describes how appearance streams for variable text must be created, s. p. 677ff; extract:

[Start quote]

"For non-rich text fields, the appearance stream—which, like all appearance
streams, is a form XObject—has the contents of its form dictionary initialized as
follows:
• The resource dictionary ( Resources ) is created using resources from the inter-
active form dictionary’s DR entry (see Table 8.67); see also implementation note
118 in Appendix H.
• The lower-left corner of the bounding box ( BBox ) is set to coordinates (0, 0) in
the form coordinate system. The box’s top and right coordinates are taken from
the dimensions of the annotation rectangle (the Rect entry in the widget anno-
tation dictionary).
• All other entries in the appearance stream’s form dictionary are set to their
default values (see Section 4.9, “Form XObjects”).

[...]

The default appearance string ( DA ) contains any graphics state or text state oper-
ators needed to establish the graphics state parameters, such as text size and color,
for displaying the field’s variable text. Only operators that are allowed within text
objects may occur in this string (see Figure 4.1 on page 197). At a minimum, the
string must include a Tf (text font) operator along with its two operands, font and
size . The specified font value must match a resource name in the Font entry of the
default resource dictionary (referenced from the DR entry of the interactive form
dictionary; see Table 8.67).

[End quote]


The corresponding object for the first form field in the ("cleaned") PDF file, "Startbahnhof", is the following widget annotation:

~~~
417 0 obj
<<
  /DA (/Helvetica 10 Tf 0 g)
  /F 4
  /FT /Tx
  /Ff 12582912
  /MK 473 0 R
  /P 370 0 R
  /Rect [ 98.7881 466.621 430.018 482.513 ]
  /StructParent 5
  /Subtype /Widget
  /T (S1F4)
  /TU (Startbahnhof)
  /Type /Annot
  /V <>
>>
endobj
~~~

It contains a "DA" (default appearance) entry of "/Helvetica 10 Tf 0 g"

As described in the quote above, the "DR" entry from the interactive form dictionary is used to initialize the resources in the appearance stream to be constructed. The interactive form dictionary is the following object:

~~~
411 0 obj
<<
  /DA (/Helv 0 Tf 0 g )
  /DR <<
    /Encoding <<
      /PDFDocEncoding 91 0 R
    >>
    /Font <<
      /Helv 90 0 R
      /ZaDb 435 0 R
    >>
  >>
  /Fields [ 89 0 R 50 0 R 87 0 R 88 0 R 62 0 R 63 0 R 81 0 R 82 0 R
      66 0 R 67 0 R 40 0 R 41 0 R 68 0 R 414 0 R 415 0 R 416 0 R
      418 0 R 417 0 R 419 0 R 420 0 R 421 0 R 69 0 R 422 0 R 423 0 R
      424 0 R 425 0 R 426 0 R 51 0 R 52 0 R 70 0 R 64 0 R 43 0 R
      65 0 R 427 0 R 447 0 R 446 0 R 445 0 R 444 0 R 443 0 R 442 0 R
      441 0 R 440 0 R 439 0 R 438 0 R 437 0 R 436 0 R 434 0 R
      433 0 R 432 0 R 431 0 R 430 0 R 429 0 R 428 0 R 71 0 R 38 0 R ]
  /SigFlags 2
>>
endobj
~~~

The contained default resources ("DR" entry) do contain a font called "Helvetica" as it is used in the "DA" entry of the form field (only one called "Helv").
For that reason, the appearance stream is not "properly" created, which leads to the text not being shown in the filled in form.

The interactive form dictionary also contains a default appearance ("DA") entry. That one uses the font "Helv", which is specified in the resources. As explained on p. 673 in the PDF spec, that (optional) "DA" serves as a document-wide default value for the DA attribute of variable text fields.
However, since the text field has its own value, the default value is not used for the form element in the given PDF file and the mismatch as described above occurs.


For testing purposes, I removed the "DA" entry in the form field element for "Startbahnhof" (line 11622) (and had "mutool" fix the xref afterwards). The resulting PDF file is attached as "bug331697_removedDA.pdf".
With that modified file, the default DA entry specified in the interactive form dictionary is used, the font name "Helv" used there does match and the appearance stream is constructed as desired. The text is shown in Okular in the filled in form as expected.


As far as I understand it so far, Poppler basically behaves in the way the PDF specification tells it to. In order to still avoid problems with "broken" files like the one given here, Poppler would probably have to implement some kind of a workaround/fallback for cases where an undefined font name is being used.


I'd like to hear other opinions on what the best way to deal with such situations is. Should Poppler implement some mechanism to deal with files as the one given or not ("works as designed")? What could be a good approach?


I was notified of another PDF form where the user-visible result of filling in the form is the same (inserted text not shown/printed), but the underlying cause is a little different. In that document, the field element has its own "DR" entry, which is ignored by Poppler (as suggested in implementation note 118 on p. 1118 of the PDF specification). The "DR" entry from the interactive form dictionary is used instead (as defined in the PDF spec) which again leads to the used font name not being defined.

That file is available at http://www.muenchen.de/rathaus/dms/Home/Stadtverwaltung/Kreisverwaltungsreferat/fachspezifisch/HA-III/Dokumente/Kfz-Zulassung/SEPA_Mandat_V_1_2_weiden.pdf and referenced from https://www.muenchen.de/dienstleistungsfinder/muenchen/1064314/n0/. I attached it to this bug report as well.


I would be very glad to get some guidance on what the best way to deal with such cases would be (e.g. implement some special handling in Poppler, try to make the authors provide fixed PDF files and close this bug as "invalid"/"wontfix",...).


PS: For the specific file given in this bug report, there is a new version available on the website of "Deutsche Bahn", which no longer shows the problem: https://www.bahn.de/p/view/mdb/bahnintern/agb/befoerderungsbedingungen/fahrgastrechteformulare/2016/mdb_220024_160401_16-fahrgastrechte-formular_de.pdf, referenced from https://www.bahn.de/p/view/service/auskunft/fahrgastrechte/fahrgastrechte-formular.shtml, attached to this bug report as well
Comment 11 Nate Graham 2017-10-11 15:45:52 UTC
My strong preference would be to make Poppler more flexibly able to handle malformed files like these. It isn't reasonable to expect all PDF documents to be well-formed, given the apparent state of PDF generation software out there. We shouldn't make our users suffer for the sins of the generator software used to produce PDFs that they may need to fill out. Users shouldn't be made to understand any of this; it should Just Work.™
Comment 12 Nate Graham 2017-10-11 15:48:13 UTC
So that would mean, for bugs like these, we would close the bug against Okular and re-open it against Poppler saying "Poppler should better handle the following case of the PDF file being malformed."
Comment 13 Michael Weghorn 2017-10-12 17:05:42 UTC
(In reply to Nate Graham from comment #12)
> So that would mean, for bugs like these, we would close the bug against
> Okular and re-open it against Poppler saying "Poppler should better handle
> the following case of the PDF file being malformed."

Thank you for your quick reply. I have now opened a bug report against Poppler: https://bugs.freedesktop.org/show_bug.cgi?id=103245

I do not see how I could close this bug report here (against Okular), so I am possibly missing the respective permissions.
Could somebody else possibly close it instead (or tell me how I can close it)?
Comment 14 Nate Graham 2017-10-12 17:07:58 UTC
Ah, you don't have close permission because you're not the originator of the ticket. I'll close it for you, and let's continue to track the upstream bug: https://bugs.freedesktop.org/show_bug.cgi?id=103245
Comment 15 Michael Weghorn 2017-10-12 17:09:01 UTC
(In reply to Nate Graham from comment #14)
> Ah, you don't have close permission because you're not the originator of the
> ticket. I'll close it for you, and let's continue to track the upstream bug:
> https://bugs.freedesktop.org/show_bug.cgi?id=103245

Thanks!