Bug 458289

Summary: Table of Contents pages always point to page 1
Product: [Applications] okular Reporter: Duane <duane-tech>
Component: EPub backendAssignee: Okular developers <okular-devel>
Status: RESOLVED FIXED    
Severity: normal CC: duane-tech
Priority: NOR    
Version: 22.08.0   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: simple demonstration epub

Description Duane 2022-08-25 10:04:36 UTC
Created attachment 151575 [details]
simple demonstration epub

SUMMARY
The toc.ncx content line references to filename as a URI (percent-encoded), this breaks on Okular.
The solution is change Okular to decode the percent-encoded filename so it references the right file in the epub zip.


STEPS TO REPRODUCE
1. create "c,.html"
<html><body><h1>Chapter 1</h1><h1>Chapter 2</h1></body></html>
2. use 
ebook-convert "c,.html" "c,.epub" --no-default-epub-cover
to create a new epub
3. open in Okular

OBSERVED RESULT
All pages in the TOC point to page 1

EXPECTED RESULT
Different entries should have different pages (1, 2, etc.)

SOFTWARE/OS VERSIONS
Linux: Arch Linux 5.19.3-arch1-1

ADDITIONAL INFORMATION
In the above example, within the epub zip, ebook-convert creates c,_split_000.html and c,_split_001.html
toc.ncx:
The spec for the table of contents file, toc.ncx, are to use URI for the references to the files[1] which requires percent encoding for all characters other than [a-zA-Z0-9-._~][2][3]
the toc.ncx file has the line:
      <content src="c%2c_split_000.html"/>
I suspect Okular is looking for the file "c%2c_split_000.html" instead of "c,_split_000.html" and not finding it, places a 1 for the page number.
Hand-editing the toc.ncx file so it has the line:
      <content src="c,_split_000.html"/>
allows Okular to work right, but the epub then is not standard.
The solution is change Okular to decode the percent-encoded file name so it references the right file.

references:
[1]  see <content> section in https://daisy.org/activities/standards/daisy/daisy-3/z39-86-2005-r2012-specifications-for-the-digital-talking-book/#NCXElem
[2]  go to paragraphs with "percent-encoded" in https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
[3]  https://www.rfc-editor.org/rfc/rfc3986#page-12
Comment 1 Bug Janitor Service 2022-08-25 21:40:13 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/647
Comment 2 Albert Astals Cid 2022-08-31 16:54:11 UTC
Git commit 656587ca6393663c8d652a63df7d8393b4adaac7 by Albert Astals Cid.
Committed on 25/08/2022 at 21:38.
Pushed by aacid into branch 'release/22.08'.

epub: Improve TableOfContents for some files

The link can be percent encoded so try it like that if not found in the
normal way, also if the text overflows the page, it's in the next page

M  +6    -2    core/textdocumentgenerator_p.h
M  +33   -28   generators/epub/converter.cpp

https://invent.kde.org/graphics/okular/commit/656587ca6393663c8d652a63df7d8393b4adaac7