Bug 426976

Summary: Okular mangles url fragment if the it contains a '.'
Product: [Applications] okular Reporter: Unknown <null>
Component: generalAssignee: Okular developers <okular-devel>
Status: RESOLVED FIXED    
Severity: normal CC: a.samirh78, aacid, kdelibs-bugs, nate
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In: 20.12.0
Attachments: Views seen when opening a PDF URl, (1): the top image, (2): the notification, and (3) the created tmpfile

Description Unknown 2020-09-26 01:24:05 UTC
Created attachment 131941 [details]
Views seen when opening a PDF URl, (1): the top image, (2): the notification, and (3) the created tmpfile

SUMMARY
Opening a PDF by clicking on a URL that contains a fragment causes okular to fail to open the URL. As seen in the attachment, okular seems to open an encoded URL as well.

kioclient5 seems to be able to download the file properly - `kioclient5 cat "https://www.latex-project.org/help/documentation/source2e.pdf#subsection.68.3"` works - but when saving this it seems to mess up.

STEPS TO REPRODUCE
`kioclient5 exec "https://www.latex-project.org/help/documentation/source2e.pdf#subsection.68.3"`
OR
`kioclient5 exec "http://www.docs.is.ed.ac.uk/skills/documents/3722/3722-2014.pdf\section.4.1"`
etc...

OBSERVED RESULT
Okular displays an error, as the attachment shows.
In addition, it creates an empty tmpfile.
Okular shows "fail to open <url-with-encoded-parts>"

EXPECTED RESULT
PDF opens in okular correctly. Ideally it should open to the specified bookmark, but I don't know if Okular supports that.
Okular should also show the original URL if errored.

SOFTWARE/OS VERSIONS
Operating System: Arch Linux
KDE Plasma Version: 5.19.5
KDE Frameworks Version: 5.74.0
Qt Version: 5.15.1
Kernel Version: 5.8.10-arch1-1
OS Type: 64-bit

ADDITIONAL INFORMATION
Comment 1 Ahmad Samir 2020-09-30 23:19:23 UTC
Okular issue.

ShellUtils::urlFromArg() mangles the url fragment if it contains a '.' character.
Comment 2 Ahmad Samir 2020-10-03 20:08:44 UTC
Is the anchor part of a url only relevant to pdf's? if that is the case then we could check for the index of ".pdf" and if the # is before it, it's part of the filename otherwise is a legit url fragment and should be kept.
Comment 3 Unknown 2020-10-05 03:23:16 UTC
Uh, I think so? I don't think I've ever seen epubs/odf/md that referenced using URL anchors. Doesn't mean it can't happen, but generally since they don't get rendered by browsers I don't think it's used much. Doesn't mean you couldn't have a epub download URL with a `.` in the URL fragment, though.
Comment 4 Albert Astals Cid 2020-10-08 22:52:26 UTC
The actual fix is checking the file exists or not to be able to know if it's an anchor or not, but that needs some rework because remote access is "slow" and the code of the function would need to be rewritten to work asynchronously.
Comment 5 Bug Janitor Service 2020-11-22 00:14:21 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/321
Comment 6 Albert Astals Cid 2020-11-26 10:55:12 UTC
Git commit 239827baad0301d578a861be55d5711d82cc2048 by Albert Astals Cid.
Committed on 26/11/2020 at 10:07.
Pushed by aacid into branch 'release/20.12'.

Rework how we open urls that have a #

Previously if it was a remote url that had # and a . after the # we
assumed the url had no fragment and everything was filename.

We don't do that anymore, what we do now is try to open the url as
parsed, i.e. before the # is the filename after is the fragment, and if
that fails we try to open everything as filename and nothing as
fragment.

Unfortunately given how kpart internals handle opening local vs remote
urls we need to do this in two places.

Also we have to remove the test that checked that the url was mangled at
the shell level because we don't do that anymore. Unfortunately can't
add a test for the new codepage since it would involve starting an http
server ^_^

Filenames:
  source2e.pdf
  foo#bar.pdf

What works:
 * okular http://localhost/source2e.pdf#subsection.68.3
 * okular file:///srv/http/source2e.pdf#subsection.68.3
 * okular source2e.pdf#subsection.68.3 (in the /srv/http folder)
 * okular source2e.pdf#2
 * okular http://localhost/foo#bar.pdf
 * okular file:///srv/http/foo#bar.pdf
 * okular foo#bar.pdf (in the /srv/http folder)

What doesn't work:
 * okular http://localhost/foo#bar.pdf#2

I think it's a fair limitation that if you want to open a file that contains # in the name and also use a # page marker you need to use the encoded url like okular http://localhost/foo%23bar.pdf#2
after all things like firefox will totally fail opening http://localhost/foo#bar.pdf and will just work if you give the encoded url

M  +0    -6    autotests/shelltest.cpp
M  +22   -5    part/part.cpp
M  +6    -0    part/part.h
M  +0    -7    shell/shellutils.cpp

https://invent.kde.org/graphics/okular/commit/239827baad0301d578a861be55d5711d82cc2048