Bug 425791 - Box characters in contents panel
Summary: Box characters in contents panel
Status: RESOLVED UPSTREAM
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 1.7.3
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-25 18:48 UTC by itisme1997
Modified: 2020-08-25 21:08 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Rendering with Okular 1.11 (poppler 0.89) (227.05 KB, image/png)
2020-08-25 19:04 UTC, Yuri Chornoivan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description itisme1997 2020-08-25 18:48:56 UTC
SUMMARY

There are box characters in the contents panel of a pdf that is being rendered. pdftk does not extract any characters where the boxes are being displayed.

STEPS TO REPRODUCE
1. Open FoundationsOfMachineLearning_Mohri_Rostamizadeh_Talwalkar.pdf
2. Examine the contents panel.

OBSERVED RESULT
Chapter headings without suffixed boxes

EXPECTED RESULT
Some chapter headings with suffixed boxes.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Ubuntu 19.10 5.3.0-64-generic
KDE Frameworks Version: 5.62.0
Qt Version: 5.12.4

ADDITIONAL INFORMATION
`pdftk FoundationsOfMachineLearning_Mohri_Rostamizadeh_Talwalkar.pdf dump_data | grep BookmarkTitle | cat -vTE` produces does not show characters after the section headers.
The file renders correctly in the firefox pdf renderer and evince.
Comment 1 Yuri Chornoivan 2020-08-25 19:04:40 UTC
Created attachment 131179 [details]
Rendering with Okular 1.11 (poppler 0.89)

No problems here. Can you give us an address for your file?

Thanks in advance for your answer.
Comment 2 Yuri Chornoivan 2020-08-25 19:05:23 UTC
Change status.
Comment 3 itisme1997 2020-08-25 19:11:54 UTC
I can't upload the file, as it's too big. In attempting to create a smaller file with pdftk, I discovered that extracting and updating the info fixes the issue.

pdftk $DOCUMENT.pdf dump_data > in.info
pdftk $DOCUMENT.pdf update_info in.info output out.pdf

There is a diff between these documents, but I don't know why. The diff is also too large to upload.
Comment 4 itisme1997 2020-08-25 19:12:41 UTC
Here is a link to the pdf.

https://www.dropbox.com/s/7voitv0vt24c88s/10290.pdf?dl=1
Comment 5 Yuri Chornoivan 2020-08-25 19:22:40 UTC
Boxes can be seen in Okular, but invisible in Evince.
Comment 6 Albert Astals Cid 2020-08-25 21:08:02 UTC
I would argue the PDF is actually broken, it just has NULL characters in the text, but since poppler was already clearing 1 NULL character from the end if it was there, i've changed it to clear as many NULL characters from the end of strings

https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/619