Bug 458996

Summary: URL parsing broken when URLs are wrapped in single-quotes
Product: [Applications] konsole Reporter: bastimeyer123
Component: generalAssignee: Konsole Developer <konsole-devel>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 22.08.1   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description bastimeyer123 2022-09-11 13:11:10 UTC
SUMMARY
The URL parsing has recently been re-implemented due to bug 452978.
However, parsing is still broken when URLs get put into single-quotes.

Highlighting and copying the first two URLs works as expected, but the third one includes the trailing ' character, which is a very annoying issue.

1. echo http://localhost
2. echo "http://localhost"
3. echo 'http://localhost'

It's very common to put URLs into quotations when passing them as CLI arguments for example, especially when trying to avoid potential string substitutions via single-quote characters. Quoting is also required due to shell-specific syntax, like the question mark for example which gets interpreted as a wildcard in FISH, so URLs with query strings always have to get quoted either via double or single quotes.

SOFTWARE/OS VERSIONS
$ lsb_release -d
Description:    Arch Linux
$ pacman -Q konsole
konsole 22.08.1-1
Comment 1 bastimeyer123 2022-11-05 10:11:01 UTC
Would it make sense adding a word boundary to the URL regex? According to the email regex, you're doing exactly that:
https://invent.kde.org/utilities/konsole/-/blob/b733bd03fd8ec49257f0564552a0565d189b8ec6/src/filterHotSpots/UrlFilter.cpp#L82

If that doesn't makes sense for URLs because of the "arbitrary" path/querystring/hash contents, would it instead make sense checking the character before matching the URL and adding a backreference of that character as a suffix? For ' and " (and ` ???) this would be simple. If you want to support parenthesis and brackets (angled ones don't seem to be supported at all), then the regex would be a bit more complex with if-conditions for the backreferences.
https://invent.kde.org/utilities/konsole/-/blob/b733bd03fd8ec49257f0564552a0565d189b8ec6/src/filterHotSpots/UrlFilter.cpp#L46

Or could the regex maybe be simplified by matching the character before the URL in a capture group as well as the URL itself and checking the last character of the URL capture group in the application logic afterwards, so that you can deal with the surrounding characters without having to bloat up the regex? That would enable handling all kinds of surrounding characters for URL matches.

Either way, always having to remove the quotation mark from a URL copied from konsole has become really tedious and annoying, so I'd really appreciate if this could be fixed soon. Thanks.
Comment 2 Bug Janitor Service 2022-11-07 01:58:28 UTC
A possibly relevant merge request was started @ https://invent.kde.org/utilities/konsole/-/merge_requests/765
Comment 3 Kurt Hindenburg 2022-11-08 18:57:19 UTC
Git commit f063ade55b491ef7d6fe6cb87d81adaed00ca041 by Kurt Hindenburg, on behalf of Luis Javier Merino MorĂ¡n.
Committed on 08/11/2022 at 18:52.
Pushed by hindenburg into branch 'master'.

url filter: remove ending apostrophe

When URLs were inside single quotes, we would include the ending quote
in the parsed URL.  To avoid that, remove a final apostrophe in a URL
when creating the hotspot.

Test:
'https://en.wikipedia.org/wiki/Earth's_rotation'

M  +17   -1    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/f063ade55b491ef7d6fe6cb87d81adaed00ca041