Bug 458516 - Spaces in content filenames causes a second copy of the book's content to be shown; TOC points to the second copy.
Summary: Spaces in content filenames causes a second copy of the book's content to be ...
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: EPub backend (show other bugs)
Version: 22.08.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-30 14:46 UTC by Duane
Modified: 2022-09-29 18:04 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
source html file (128 bytes, text/html)
2022-08-30 14:46 UTC, Duane
Details
an epub file illustrating the bug (3.10 KB, application/epub+zip)
2022-09-01 21:44 UTC, Duane
Details
screenshot of okular mistakenly displaying a second copy of the epub content (84.66 KB, image/png)
2022-09-14 00:29 UTC, Duane
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Duane 2022-08-30 14:46:01 UTC
Created attachment 151708 [details]
source html file

SUMMARY
An epub file with html content filenames with spaces in the epub zip file cause a doubling on the content with the TOC pointing to the second copy.

STEPS TO REPRODUCE
1. create epub file with spaces in component filenames
  1.1. create html file: "te st.html" (with space)
    nano "te st.html"
<html><body>
    <h1>Chapter 1</h1>
    <h1>Chapter 2</h1>
    <h1>Chapter 3</h1>
    <h1>Chapter 4</h1>
    <h1>Chapter 5</h1>
</body></html>

  1.2. convert to epub file (with spaces in component filenames)
    ebook-convert "te st".{html,epub}
  1.3. review contents
    unzip -l "te st.epub"
2. view with okular
  okular "te st.epub"

OBSERVED RESULT
The reader will show Title Page, Chapter 1, Chapter 2, Chapter 3, Chapter 4, Chapter 5, Chapter 1, Chapter 2, Chapter 3, Chapter 4, Chapter 5.
The table of contents will point to the second occurrence so Chapter 1 will be on page 7.

EXPECTED RESULT
Reader should show Chapter 1, Chapter 2, Chapter 3, Chapter 4, Chapter 5.
The TOC should place Chapter 1 on page 2.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 
5.19.4-arch1-1 x86_64 GNU/Linux
Window Manager:
jwm 2.3.7-3

ADDITIONAL INFORMATION
Manually removing spaces in component file names (te st_split_000.html, etc.) and editing content in conent.opf and toc.ncx to remove space and %20 in references corrects the problem.
ebook-viewer does not share this problem.
There is no doubling of content references in either content.opf or toc.ncx.

Playing around with it:  If I have:
test__split_000.html
te st__split_001.html
te st__split_002.html
test__split_003.html
te st__split_004.html
and edit contents.opt to have lines:
    <item id="html5" href="test_split_000.html" media-type="application/xhtml+xml"/>
    <item id="html4" href="te st_split_001.html" media-type="application/xhtml+xml"/>
    <item id="html3" href="te st_split_002.html" media-type="application/xhtml+xml"/>
    <item id="html2" href="test_split_003.html" media-type="application/xhtml+xml"/>
    <item id="html1" href="te st_split_004.html" media-type="application/xhtml+xml"/>
and edit toc.ncx, changing lines:
      <content src="te%20st_split_000.html"/>
...
      <content src="te%20st_split_003.html"/>
changed to:
      <content src="test_split_000.html"/>
...
      <content src="test_split_003.html"/>

The book shows Title Page, Chapter 1, Chapter 2, Chapter 3, Chapter 4, Chapter 5, Chapter 2, Chapter 3, Chapter 5
Second copies of Chapters 1 and 4 are missing.
The TOC shows Chapters 1-5 pointing to pages 2, 7, 8, 4, 9.
Comment 1 Albert Astals Cid 2022-08-31 17:24:03 UTC
Can you attach such an epub file?
Comment 2 Duane 2022-09-01 21:44:43 UTC
Created attachment 151774 [details]
an epub file illustrating the bug

This html source differs from the previous source in that it only has chapters 1-3. A necessary change so that the epub will fit wtihin the 4k file size submission restraint.
It still illustrates the repeated content.
Comment 3 Albert Astals Cid 2022-09-13 21:57:04 UTC
I don't see any problem with the attached file.

What is wrong with it?
Comment 4 Duane 2022-09-14 00:29:01 UTC
Created attachment 152038 [details]
screenshot of okular mistakenly displaying a second copy of the epub content

Note that the TOC shows chapters 1-3 pointing to pages 4-6. This is the second copy of the content. The first copy of chapters 1-3 is on pages 1-3. Page through the document and you'll see it displayed as:
page 1  Chapter 1
page 2  Chapter 2
page 3  Chapter 3
page 4  Chapter 1 (pointed to by TOC)
page 5  Chapter 1 (pointed to by TOC)
page 6  Chapter 3 (pointed to by TOC)
page 7  (blank)
.
Yet, the epub only contains (chapter) files:  
te st_split_000.html
te st_split_001.html
te st_split_002.html
. 
Rename the files to "test_split_000.html", "test_split_001.html", and "test_split_002.html", and edit "content.opf" and "toc.ncx" to contain the new names and okular stops displaying the first copy of the chapters.
Comment 5 Duane 2022-09-14 00:35:37 UTC
On 2022-09-13 15:57, Albert Astals Cid wrote:
> https://bugs.kde.org/show_bug.cgi?id=458516
>
> --- Comment #3 from Albert Astals Cid <aacid@kde.org> ---
> I don't see any problem with the attached file.
>
> What is wrong with it?
>
The file itself is OK. The problem is in Okular's displaying of the 
file. I've added a screenshot of Okular illustrating the problem to the 
bug report [ https://bugsfiles.kde.org/attachment.cgi?id=152038 ].

Note that the TOC shows chapters 1-3 pointing to pages 4-6. This is the 
second copy of the content. The first copy of chapters 1-3 is on pages 
1-3. Page through the document and you'll see it displayed as:
page 1  Chapter 1
page 2  Chapter 2
page 3  Chapter 3
page 4  Chapter 1 (pointed to by TOC)
page 5  Chapter 2 (pointed to by TOC)
page 6  Chapter 3 (pointed to by TOC)
page 7  (blank)

Yet, the epub contains only (chapter) files:
te st_split_000.html
te st_split_001.html
te st_split_002.html

Rename the epub files to "test_split_000.html", "test_split_001.html", 
and "test_split_002.html", and edit "content.opf" and "toc.ncx" to 
contain the new names and Okular stops displaying the first copy of the 
chapters. Pages in the TOC are corrected as well.
Comment 6 Albert Astals Cid 2022-09-14 21:57:02 UTC
You mean you get 7 pages when opening https://bugs.kde.org/attachment.cgi?id=151774 ?
Comment 7 Duane 2022-09-14 23:21:10 UTC
Comment on attachment 152038 [details]
screenshot of okular mistakenly displaying a second copy of the epub content

Yes, Okular inserts the chapters a first time, then adds them again with the TOC pointing to the second copy.
So the output shown for just a 3 chapter document is:
page 1  Chapter 1
page 2  Chapter 2
page 3  Chapter 3
page 4  Chapter 1 (pointed to by TOC)
page 5  Chapter 2 (pointed to by TOC)
page 6  Chapter 3 (pointed to by TOC)
page 7  (blank)
.
when the chapter filenames contain spaces.
Comment 8 Bug Janitor Service 2022-09-29 04:48:36 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 9 Albert Astals Cid 2022-09-29 09:22:00 UTC
This needs to be re-checked i can't reproduce what the reporter says
Comment 10 Duane 2022-09-29 18:04:31 UTC
(In reply to Albert Astals Cid from comment #9)
> This needs to be re-checked i can't reproduce what the reporter says

Please post a screenshot of what you do get when you display the EPUB file.