SUMMARY There are box characters in the contents panel of a pdf that is being rendered. pdftk does not extract any characters where the boxes are being displayed. STEPS TO REPRODUCE 1. Open FoundationsOfMachineLearning_Mohri_Rostamizadeh_Talwalkar.pdf 2. Examine the contents panel. OBSERVED RESULT Chapter headings without suffixed boxes EXPECTED RESULT Some chapter headings with suffixed boxes. SOFTWARE/OS VERSIONS Linux/KDE Plasma: Ubuntu 19.10 5.3.0-64-generic KDE Frameworks Version: 5.62.0 Qt Version: 5.12.4 ADDITIONAL INFORMATION `pdftk FoundationsOfMachineLearning_Mohri_Rostamizadeh_Talwalkar.pdf dump_data | grep BookmarkTitle | cat -vTE` produces does not show characters after the section headers. The file renders correctly in the firefox pdf renderer and evince.
Created attachment 131179 [details] Rendering with Okular 1.11 (poppler 0.89) No problems here. Can you give us an address for your file? Thanks in advance for your answer.
Change status.
I can't upload the file, as it's too big. In attempting to create a smaller file with pdftk, I discovered that extracting and updating the info fixes the issue. pdftk $DOCUMENT.pdf dump_data > in.info pdftk $DOCUMENT.pdf update_info in.info output out.pdf There is a diff between these documents, but I don't know why. The diff is also too large to upload.
Here is a link to the pdf. https://www.dropbox.com/s/7voitv0vt24c88s/10290.pdf?dl=1
Boxes can be seen in Okular, but invisible in Evince.
I would argue the PDF is actually broken, it just has NULL characters in the text, but since poppler was already clearing 1 NULL character from the end if it was there, i've changed it to clear as many NULL characters from the end of strings https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/619