Bug 452978

Summary: URL parsing broken if URL includes port number or comma (or address is IPv6)
Product: [Applications] konsole Reporter: bastimeyer123
Component: generalAssignee: Konsole Developer <konsole-devel>
Status: RESOLVED FIXED    
Severity: normal CC: a.samirh78, bugs, christian, nate, ninjalj, piotr.mierzwinski, post, sudden6, suienzan, tiposchi, tomasz.kane
Priority: NOR    
Version: 22.04.0   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In: 22.08
Sentry Crash Report:

Description bastimeyer123 2022-04-25 08:29:28 UTC
SUMMARY
This was introduced with the recent upgrade of Konsole to 22.04.0. Konsole 21.12.3 was working fine.

It now fails to parse URLs which include a port number. When pressing CTRL and clicking the link, it only matches the URL until the first colon character, leaving out the rest of the URL, namely the port number, as well as anything that comes afterwards, aka. path, query and hash.

STEPS TO REPRODUCE
1. http://localhost
2. http://localhost/foo
3. http://127.0.0.1/foo
4. http://localhost:8888
5. http://localhost:8888/foo
6. http://127.0.0.1:8888/foo
7. http://[2a00:1450:4001:829::200e]/
8. http://[2a00:1450:4001:829::200e]:80/

OBSERVED RESULT
1-3 work
4-6 don't work
7-8 also don't work, but I'm not sure if this was working previously

SOFTWARE/OS VERSIONS
Linux: 5.17.4
KDE Plasma Version: 5.24.4
KDE Frameworks Version: 5.93.0
Qt Version: 5.15.3

ADDITIONAL INFORMATION
After searching for recent commits in Konsole's git repo in regards to URL parsing, I found these changes which added some distinctions between local file paths and URLs. Maybe it's related to the issue, but I am not sure, as I'm not familiar with Konsole's code at all.
https://invent.kde.org/utilities/konsole/-/commits/v22.04.0/src/filterHotSpots
Comment 1 bastimeyer123 2022-04-26 09:44:42 UTC
URLs with username:password fail too, btw:
http://username:password@localhost/
Comment 2 ninjalj 2022-04-30 18:08:56 UTC
The offending commit would be https://invent.kde.org/utilities/konsole/-/commit/1ee8c64b5189d9e32ae9660308b2edc0f35ea6f0

As to what the proper regex should be, the syntax for URIs is specified at https://datatracker.ietf.org/doc/html/rfc3986#appendix-A, with augmented BNF rules as defined in https://datatracker.ietf.org/doc/html/rfc2234
Comment 3 Ahmad Samir 2022-05-01 17:50:28 UTC
I think 7 and 8 aren't that common (and it would complicate the regular expression used even more), so we can hopefully skip those ones.

I've tried my hand at implementing rfc3986 (thanks for the link :)) at https://invent.kde.org/utilities/konsole/-/merge_requests/646
not 100% compliant, but close enough.
Comment 4 Ahmad Samir 2022-05-04 12:25:21 UTC
*** Bug 453364 has been marked as a duplicate of this bug. ***
Comment 5 Ahmad Samir 2022-05-04 16:37:15 UTC
*** Bug 453364 has been marked as a duplicate of this bug. ***
Comment 6 Piotr Mierzwinski 2022-05-04 16:42:31 UTC
In Bug 453364  I reported issue where in url is present comma. And this case has not been mentioned here.
Comment 7 Ahmad Samir 2022-05-04 16:48:58 UTC
(In reply to Piotr Mierzwinski from comment #6)
> In Bug 453364  I reported issue where in url is present comma. And this case
> has not been mentioned here.

The change in https://invent.kde.org/utilities/konsole/-/merge_requests/646 fixes the url-with-comma case too.
Comment 8 Ahmad Samir 2022-05-06 12:48:51 UTC
*** Bug 453448 has been marked as a duplicate of this bug. ***
Comment 9 Christian Finnberg 2022-05-22 05:41:34 UTC
*** Bug 454193 has been marked as a duplicate of this bug. ***
Comment 10 Kurt Hindenburg 2022-05-28 20:42:57 UTC
Git commit 3b7e73f54e2b07e658152ff01964d7ef398616cd by Kurt Hindenburg, on behalf of Ahmad Samir.
Committed on 28/05/2022 at 20:37.
Pushed by hindenburg into branch 'master'.

UrlFilter::FullUrlRegExp matches more valid urls

Add unittest.

This is based on:
https://datatracker.ietf.org/doc/html/rfc3986
FIXED-IN: 22.08

M  +5    -0    src/autotests/CMakeLists.txt
A  +46   -0    src/autotests/HotSpotFilterTest.cpp     [License: GPL(v2.0+)]
A  +21   -0    src/autotests/HotSpotFilterTest.h     [License: GPL(v2.0+)]
M  +45   -3    src/filterHotSpots/UrlFilter.cpp
M  +4    -1    src/filterHotSpots/UrlFilter.h

https://invent.kde.org/utilities/konsole/commit/3b7e73f54e2b07e658152ff01964d7ef398616cd
Comment 11 Ahmad Samir 2022-06-07 09:07:04 UTC
*** Bug 454957 has been marked as a duplicate of this bug. ***
Comment 12 Ahmad Samir 2022-06-08 10:03:39 UTC
*** Bug 455018 has been marked as a duplicate of this bug. ***
Comment 13 Ahmad Samir 2022-06-22 21:18:00 UTC
*** Bug 455806 has been marked as a duplicate of this bug. ***
Comment 14 Ahmad Samir 2022-07-01 18:01:15 UTC
Git commit c8d8abd1b2b8e93cc38b3bedb1b63f69791e5882 by Ahmad Samir.
Committed on 01/07/2022 at 15:33.
Pushed by ahmadsamir into branch 'release/22.04'.

UrlFilter::FullUrlRegExp matches more valid urls

Add unittest.

This is based on:
https://datatracker.ietf.org/doc/html/rfc3986
FIXED-IN: 22.08

M  +5    -0    src/autotests/CMakeLists.txt
A  +46   -0    src/autotests/HotSpotFilterTest.cpp     [License: GPL(v2.0+)]
A  +21   -0    src/autotests/HotSpotFilterTest.h     [License: GPL(v2.0+)]
M  +45   -3    src/filterHotSpots/UrlFilter.cpp
M  +4    -1    src/filterHotSpots/UrlFilter.h

https://invent.kde.org/utilities/konsole/commit/c8d8abd1b2b8e93cc38b3bedb1b63f69791e5882
Comment 15 Ahmad Samir 2022-07-02 18:21:12 UTC
*** Bug 456253 has been marked as a duplicate of this bug. ***
Comment 16 Paul Worrall 2022-09-11 12:19:57 UTC
*** Bug 458991 has been marked as a duplicate of this bug. ***