Bug 458996 - URL parsing broken when URLs are wrapped in single-quotes
Summary: URL parsing broken when URLs are wrapped in single-quotes
Status: RESOLVED FIXED
Alias: None
Product: konsole
Classification: Applications
Component: general (show other bugs)
Version: 22.08.1
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-11 13:11 UTC by bastimeyer123
Modified: 2022-11-08 18:57 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bastimeyer123 2022-09-11 13:11:10 UTC
SUMMARY
The URL parsing has recently been re-implemented due to bug 452978.
However, parsing is still broken when URLs get put into single-quotes.

Highlighting and copying the first two URLs works as expected, but the third one includes the trailing ' character, which is a very annoying issue.

1. echo http://localhost
2. echo "http://localhost"
3. echo 'http://localhost'

It's very common to put URLs into quotations when passing them as CLI arguments for example, especially when trying to avoid potential string substitutions via single-quote characters. Quoting is also required due to shell-specific syntax, like the question mark for example which gets interpreted as a wildcard in FISH, so URLs with query strings always have to get quoted either via double or single quotes.

SOFTWARE/OS VERSIONS
$ lsb_release -d
Description:    Arch Linux
$ pacman -Q konsole
konsole 22.08.1-1
Comment 1 bastimeyer123 2022-11-05 10:11:01 UTC
Would it make sense adding a word boundary to the URL regex? According to the email regex, you're doing exactly that:
https://invent.kde.org/utilities/konsole/-/blob/b733bd03fd8ec49257f0564552a0565d189b8ec6/src/filterHotSpots/UrlFilter.cpp#L82

If that doesn't makes sense for URLs because of the "arbitrary" path/querystring/hash contents, would it instead make sense checking the character before matching the URL and adding a backreference of that character as a suffix? For ' and " (and ` ???) this would be simple. If you want to support parenthesis and brackets (angled ones don't seem to be supported at all), then the regex would be a bit more complex with if-conditions for the backreferences.
https://invent.kde.org/utilities/konsole/-/blob/b733bd03fd8ec49257f0564552a0565d189b8ec6/src/filterHotSpots/UrlFilter.cpp#L46

Or could the regex maybe be simplified by matching the character before the URL in a capture group as well as the URL itself and checking the last character of the URL capture group in the application logic afterwards, so that you can deal with the surrounding characters without having to bloat up the regex? That would enable handling all kinds of surrounding characters for URL matches.

Either way, always having to remove the quotation mark from a URL copied from konsole has become really tedious and annoying, so I'd really appreciate if this could be fixed soon. Thanks.
Comment 2 Bug Janitor Service 2022-11-07 01:58:28 UTC
A possibly relevant merge request was started @ https://invent.kde.org/utilities/konsole/-/merge_requests/765
Comment 3 Kurt Hindenburg 2022-11-08 18:57:19 UTC
Git commit f063ade55b491ef7d6fe6cb87d81adaed00ca041 by Kurt Hindenburg, on behalf of Luis Javier Merino Morán.
Committed on 08/11/2022 at 18:52.
Pushed by hindenburg into branch 'master'.

url filter: remove ending apostrophe

When URLs were inside single quotes, we would include the ending quote
in the parsed URL.  To avoid that, remove a final apostrophe in a URL
when creating the hotspot.

Test:
'https://en.wikipedia.org/wiki/Earth's_rotation'

M  +17   -1    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/f063ade55b491ef7d6fe6cb87d81adaed00ca041