Bug 426976 - Okular mangles url fragment if the it contains a '.'
Summary: Okular mangles url fragment if the it contains a '.'
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Archlinux Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
Depends on:
Reported: 2020-09-26 01:24 UTC by Unknown
Modified: 2020-11-28 11:25 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 20.12.0

Views seen when opening a PDF URl, (1): the top image, (2): the notification, and (3) the created tmpfile (71.75 KB, image/jpeg)
2020-09-26 01:24 UTC, Unknown

Note You need to log in before you can comment on or make changes to this bug.
Description Unknown 2020-09-26 01:24:05 UTC
Created attachment 131941 [details]
Views seen when opening a PDF URl, (1): the top image, (2): the notification, and (3) the created tmpfile

Opening a PDF by clicking on a URL that contains a fragment causes okular to fail to open the URL. As seen in the attachment, okular seems to open an encoded URL as well.

kioclient5 seems to be able to download the file properly - `kioclient5 cat "https://www.latex-project.org/help/documentation/source2e.pdf#subsection.68.3"` works - but when saving this it seems to mess up.

`kioclient5 exec "https://www.latex-project.org/help/documentation/source2e.pdf#subsection.68.3"`
`kioclient5 exec "http://www.docs.is.ed.ac.uk/skills/documents/3722/3722-2014.pdf\section.4.1"`

Okular displays an error, as the attachment shows.
In addition, it creates an empty tmpfile.
Okular shows "fail to open <url-with-encoded-parts>"

PDF opens in okular correctly. Ideally it should open to the specified bookmark, but I don't know if Okular supports that.
Okular should also show the original URL if errored.

Operating System: Arch Linux
KDE Plasma Version: 5.19.5
KDE Frameworks Version: 5.74.0
Qt Version: 5.15.1
Kernel Version: 5.8.10-arch1-1
OS Type: 64-bit

Comment 1 Ahmad Samir 2020-09-30 23:19:23 UTC
Okular issue.

ShellUtils::urlFromArg() mangles the url fragment if it contains a '.' character.
Comment 2 Ahmad Samir 2020-10-03 20:08:44 UTC
Is the anchor part of a url only relevant to pdf's? if that is the case then we could check for the index of ".pdf" and if the # is before it, it's part of the filename otherwise is a legit url fragment and should be kept.
Comment 3 Unknown 2020-10-05 03:23:16 UTC
Uh, I think so? I don't think I've ever seen epubs/odf/md that referenced using URL anchors. Doesn't mean it can't happen, but generally since they don't get rendered by browsers I don't think it's used much. Doesn't mean you couldn't have a epub download URL with a `.` in the URL fragment, though.
Comment 4 Albert Astals Cid 2020-10-08 22:52:26 UTC
The actual fix is checking the file exists or not to be able to know if it's an anchor or not, but that needs some rework because remote access is "slow" and the code of the function would need to be rewritten to work asynchronously.
Comment 5 Bug Janitor Service 2020-11-22 00:14:21 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/321
Comment 6 Albert Astals Cid 2020-11-26 10:55:12 UTC
Git commit 239827baad0301d578a861be55d5711d82cc2048 by Albert Astals Cid.
Committed on 26/11/2020 at 10:07.
Pushed by aacid into branch 'release/20.12'.

Rework how we open urls that have a #

Previously if it was a remote url that had # and a . after the # we
assumed the url had no fragment and everything was filename.

We don't do that anymore, what we do now is try to open the url as
parsed, i.e. before the # is the filename after is the fragment, and if
that fails we try to open everything as filename and nothing as

Unfortunately given how kpart internals handle opening local vs remote
urls we need to do this in two places.

Also we have to remove the test that checked that the url was mangled at
the shell level because we don't do that anymore. Unfortunately can't
add a test for the new codepage since it would involve starting an http
server ^_^


What works:
 * okular http://localhost/source2e.pdf#subsection.68.3
 * okular file:///srv/http/source2e.pdf#subsection.68.3
 * okular source2e.pdf#subsection.68.3 (in the /srv/http folder)
 * okular source2e.pdf#2
 * okular http://localhost/foo#bar.pdf
 * okular file:///srv/http/foo#bar.pdf
 * okular foo#bar.pdf (in the /srv/http folder)

What doesn't work:
 * okular http://localhost/foo#bar.pdf#2

I think it's a fair limitation that if you want to open a file that contains # in the name and also use a # page marker you need to use the encoded url like okular http://localhost/foo%23bar.pdf#2
after all things like firefox will totally fail opening http://localhost/foo#bar.pdf and will just work if you give the encoded url

M  +0    -6    autotests/shelltest.cpp
M  +22   -5    part/part.cpp
M  +6    -0    part/part.h
M  +0    -7    shell/shellutils.cpp