Bug 496817 - Open Links plugin should stop at closing parenthesis
Summary: Open Links plugin should stop at closing parenthesis
Status: REPORTED
Alias: None
Product: kate
Classification: Applications
Component: general (show other bugs)
Version: 24.08.3
Platform: openSUSE Linux
: NOR wishlist
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-29 02:53 UTC by Grósz Dániel
Modified: 2024-12-02 19:59 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Grósz Dániel 2024-11-29 02:53:43 UTC
SUMMARY
Minor suggestion: The Open Links plugin considers many whitespace and punctuation characters to terminate a link, but not a closing parenthesis, so if a link is parenthesised, or it's in a markdown document, where the URL of a link is in parentheses, the closing parenthesis is included in the URL. While a ")" can technically be part of the path in a URL, it's much more likely to be not intended to be part of the URL, so the URL recognition should stop before it.

I'm not sure there's other punctuation that should be added to the list of characters that terminate a URL. Perhaps stop before a ".", ":" or "?" if they are followed by whitespace (or the end of the document); conversely, only stop before a ";" if followed by whitespace (currently that delimits a URL, but while uncommon, it can be used as a separator within URLs equivalent to "&").

STEPS TO REPRODUCE
1. Enter this: (https://example.com/)

OBSERVED RESULT
The closing ")" is underlined, and https://example.com/) is opened in the browser. 
(Moreover, if the URL is entered without the trailing "/", it's opened as http://https//example.com) , at least in Firefox.)

EXPECTED RESULT
Only https://example.com/ is underlined and opened.

SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20241126
KDE Plasma Version: 6.2.3
KDE Frameworks Version: 6.8.0
Qt Version: 6.8.0
Kernel Version: 6.11.7-1-default (64-bit)
Graphics Platform: X11
Comment 1 Waqar Ahmed 2024-12-02 04:58:57 UTC
For markdown, the bracket should not be included. If you know of a case where bracket gets included please share..
Comment 2 Grósz Dániel 2024-12-02 17:48:50 UTC
(In reply to Waqar Ahmed from comment #1)
> For markdown, the bracket should not be included. If you know of a case
> where bracket gets included please share..

You're right, if the parenthesised  are preceded by a closing square bracket, the parentheses don't get included, so Markdown links aren't broken. However, if it isn't preceded by a closing square bracket, the closing parenthesis does get included, even though it's more likely to be just a link, and perhaps some preceding text, in parentheses. Though some Wikipedia links contain parentheses; perhaps exclude a closing parenthesis only if the link doesn't contain a matching opening parenthesis?
Comment 3 Waqar Ahmed 2024-12-02 18:02:40 UTC
For a fully correct implementation we need to scan the whole doc for parenthesis. So its not very simple, e.g., think of a long comment ending with "link.com).
Comment 4 Grósz Dániel 2024-12-02 19:59:40 UTC
(In reply to Waqar Ahmed from comment #3)
> For a fully correct implementation we need to scan the whole doc for
> parenthesis. So its not very simple, e.g., think of a long comment ending
> with "link.com).

It's always going to be a heuristic, I'm not even sure "fully correct" is meaningful, I'd just like a better heuristic like the current one. I think treating a ")" as ending a link unless the preceding part of the *link itself* contains more "("s than ")"s would almost always get it right. No need to check the preceding content: if the ")" isn't closing a parenthetical within the link itself, it's almost certainly closing a parenthetical surrounding the link, even if you don't check.