Bug 458289 - Table of Contents pages always point to page 1
Summary: Table of Contents pages always point to page 1
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: EPub backend (show other bugs)
Version: 22.08.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-25 10:04 UTC by Duane
Modified: 2022-08-31 16:54 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
simple demonstration epub (2.67 KB, application/epub+zip)
2022-08-25 10:04 UTC, Duane
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Duane 2022-08-25 10:04:36 UTC
Created attachment 151575 [details]
simple demonstration epub

SUMMARY
The toc.ncx content line references to filename as a URI (percent-encoded), this breaks on Okular.
The solution is change Okular to decode the percent-encoded filename so it references the right file in the epub zip.


STEPS TO REPRODUCE
1. create "c,.html"
<html><body><h1>Chapter 1</h1><h1>Chapter 2</h1></body></html>
2. use 
ebook-convert "c,.html" "c,.epub" --no-default-epub-cover
to create a new epub
3. open in Okular

OBSERVED RESULT
All pages in the TOC point to page 1

EXPECTED RESULT
Different entries should have different pages (1, 2, etc.)

SOFTWARE/OS VERSIONS
Linux: Arch Linux 5.19.3-arch1-1

ADDITIONAL INFORMATION
In the above example, within the epub zip, ebook-convert creates c,_split_000.html and c,_split_001.html
toc.ncx:
The spec for the table of contents file, toc.ncx, are to use URI for the references to the files[1] which requires percent encoding for all characters other than [a-zA-Z0-9-._~][2][3]
the toc.ncx file has the line:
      <content src="c%2c_split_000.html"/>
I suspect Okular is looking for the file "c%2c_split_000.html" instead of "c,_split_000.html" and not finding it, places a 1 for the page number.
Hand-editing the toc.ncx file so it has the line:
      <content src="c,_split_000.html"/>
allows Okular to work right, but the epub then is not standard.
The solution is change Okular to decode the percent-encoded file name so it references the right file.

references:
[1]  see <content> section in https://daisy.org/activities/standards/daisy/daisy-3/z39-86-2005-r2012-specifications-for-the-digital-talking-book/#NCXElem
[2]  go to paragraphs with "percent-encoded" in https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
[3]  https://www.rfc-editor.org/rfc/rfc3986#page-12
Comment 1 Bug Janitor Service 2022-08-25 21:40:13 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/647
Comment 2 Albert Astals Cid 2022-08-31 16:54:11 UTC
Git commit 656587ca6393663c8d652a63df7d8393b4adaac7 by Albert Astals Cid.
Committed on 25/08/2022 at 21:38.
Pushed by aacid into branch 'release/22.08'.

epub: Improve TableOfContents for some files

The link can be percent encoded so try it like that if not found in the
normal way, also if the text overflows the page, it's in the next page

M  +6    -2    core/textdocumentgenerator_p.h
M  +33   -28   generators/epub/converter.cpp

https://invent.kde.org/graphics/okular/commit/656587ca6393663c8d652a63df7d8393b4adaac7