Bug 455166 - URL matching regressions
Summary: URL matching regressions
Status: RESOLVED FIXED
Alias: None
Product: konsole
Classification: Applications
Component: general (show other bugs)
Version: master
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-11 22:30 UTC by Martin Sandsmark
Modified: 2022-12-30 18:04 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
dumb fix (3.42 KB, patch)
2022-06-12 16:30 UTC, Martin Sandsmark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Sandsmark 2022-06-11 22:30:38 UTC
It seems like the URL regex has been changed recently, e. g. markdown URLs are broken again, so "[foo](https://bar/)" now includes the ending ), and "`https://foo.bar`" completely fail to match.

Here's how markdown URLs were fixed last time fwiw: https://invent.kde.org/utilities/konsole/-/commit/1ee8c64b5189d9e32ae9660308b2edc0f35ea6f0
Comment 1 Bug Janitor Service 2022-06-12 09:41:08 UTC
A possibly relevant merge request was started @ https://invent.kde.org/utilities/konsole/-/merge_requests/682
Comment 2 Ahmad Samir 2022-06-12 09:42:29 UTC
`https://foo.bar` should be fixed with that MR.

The issue with [foo](https://bar/) is that ")" is a valid character than can exist in a URL, I don't see a simple way to match markup/down urls of that form, without breaking other stuff.
Comment 3 Martin Sandsmark 2022-06-12 15:43:11 UTC
(In reply to Ahmad Samir from comment #2)
> `https://foo.bar` should be fixed with that MR.
> 
> The issue with [foo](https://bar/) is that ")" is a valid character than can
> exist in a URL, I don't see a simple way to match markup/down urls of that
> form, without breaking other stuff.

What kind of other URLs would break?
URLs in markdown are very common, so unless it breaks a lot of URLs to handle them I'd say it's worth it (I don't think there used to be any issues).

Another URL that don't work anymore are foo=https://bar.com fwiw.
Comment 4 Martin Sandsmark 2022-06-12 16:30:54 UTC
Created attachment 149643 [details]
dumb fix

Simplest fix I could think of. It passes the tests, but it feels too simple so I'm assuming it breaks in some weird ways.
Comment 5 Ahmad Samir 2022-06-12 16:32:51 UTC
(In reply to Martin Sandsmark from comment #3)
> (In reply to Ahmad Samir from comment #2)
> > `https://foo.bar` should be fixed with that MR.
> > 
> > The issue with [foo](https://bar/) is that ")" is a valid character than can
> > exist in a URL, I don't see a simple way to match markup/down urls of that
> > form, without breaking other stuff.
> 
> What kind of other URLs would break?

Bug 452978 (and it's duplicates).
Comment 6 Bug Janitor Service 2022-06-14 15:59:36 UTC
A possibly relevant merge request was started @ https://invent.kde.org/utilities/konsole/-/merge_requests/683
Comment 7 Kurt Hindenburg 2022-06-16 16:35:35 UTC
Git commit 651bcfb733d1cb8a5b05d095c7cbbfebbf889282 by Kurt Hindenburg, on behalf of Ahmad Samir.
Committed on 16/06/2022 at 16:23.
Pushed by hindenburg into branch 'master'.

Match urls between grave `
FIXED-IN: 22.08

M  +3    -0    src/autotests/HotSpotFilterTest.cpp
M  +1    -1    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/651bcfb733d1cb8a5b05d095c7cbbfebbf889282
Comment 8 Kurt Hindenburg 2022-06-16 16:51:50 UTC
Git commit 48060e31e23e974eba134bea86c976285085a73b by Kurt Hindenburg, on behalf of Luis Javier Merino Morán.
Committed on 16/06/2022 at 16:36.
Pushed by hindenburg into branch 'master'.

Only recognize URIs with balanced parentheses

To prevent URIs inside parentheses from getting extended to the closing
parenthesis, only recognize URIs with balanced parentheses in regname,
path, query and/or fragment.  We still allow unbalanced parenthesis in
userInfo, since the postfix @ should prevent most ambiguous situations,
and the parenthesis can be part of a password.

M  +13   -0    src/autotests/HotSpotFilterTest.cpp
M  +13   -8    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/48060e31e23e974eba134bea86c976285085a73b
Comment 9 SvenK 2022-07-01 10:15:28 UTC
Another annoying thing which recently changed is that URLs http://127.0.0.1:4000/  used to be properly parsed but now are interpreted as http://127.0.0.1, i.e. the port is missing. These kind of URLs typically appear in static site generators, kubernetes, etc.

See also ticket https://bugs.kde.org/show_bug.cgi?id=430938 which proposes regexpes to fix this on a personal level.
Comment 10 Ahmad Samir 2022-07-01 13:46:27 UTC
(In reply to SvenK from comment #9)
> Another annoying thing which recently changed is that URLs
> http://127.0.0.1:4000/  used to be properly parsed but now are interpreted
> as http://127.0.0.1, i.e. the port is missing. These kind of URLs typically
> appear in static site generators, kubernetes, etc.
> 

That is already fixed in master.
Comment 11 Ahmad Samir 2022-07-01 18:01:23 UTC
Git commit e925acc1ca60ef1116fa84cc35328d5accad23c5 by Ahmad Samir, on behalf of Luis Javier Merino Morán.
Committed on 01/07/2022 at 15:33.
Pushed by ahmadsamir into branch 'release/22.04'.

Only recognize URIs with balanced parentheses

To prevent URIs inside parentheses from getting extended to the closing
parenthesis, only recognize URIs with balanced parentheses in regname,
path, query and/or fragment.  We still allow unbalanced parenthesis in
userInfo, since the postfix @ should prevent most ambiguous situations,
and the parenthesis can be part of a password.

M  +13   -0    src/autotests/HotSpotFilterTest.cpp
M  +13   -8    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/e925acc1ca60ef1116fa84cc35328d5accad23c5
Comment 12 Ahmad Samir 2022-07-01 18:01:31 UTC
Git commit 6e7cf530a8a9ef2676a1b4faf9a06af9f955c5be by Ahmad Samir.
Committed on 01/07/2022 at 15:33.
Pushed by ahmadsamir into branch 'release/22.04'.

Match urls between grave `
FIXED-IN: 22.08

M  +3    -0    src/autotests/HotSpotFilterTest.cpp
M  +1    -1    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/6e7cf530a8a9ef2676a1b4faf9a06af9f955c5be
Comment 13 Bug Janitor Service 2022-12-01 22:51:53 UTC
A possibly relevant merge request was started @ https://invent.kde.org/utilities/konsole/-/merge_requests/780
Comment 14 Kurt Hindenburg 2022-12-10 08:17:34 UTC
Git commit 2e84af9400409021fe1ad64e7e56f2c5a0e6f82f by Kurt Hindenburg, on behalf of Luis Javier Merino Morán.
Committed on 10/12/2022 at 08:11.
Pushed by hindenburg into branch 'master'.

url filter: start regex with word boundary

This allows recognizing more kinds of enclosed URIs, e.g:

 - angle brackets: <https://example.com>
 - lenticular brackets: 【https://example.com】
 - braces: {https://example.com}
 - guillemets: «https://example.com»
 - single guillemets ‹https://example.com›
 - ...

and also URIs right after an equals sign, e.g: foo=https://example.com.
Related: bug 462511

M  +6    -0    src/autotests/HotSpotFilterTest.cpp
M  +1    -1    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/2e84af9400409021fe1ad64e7e56f2c5a0e6f82f
Comment 15 Kurt Hindenburg 2022-12-30 18:04:18 UTC
Git commit a48e4deab23de1283f89aea49bf8037412daede1 by Kurt Hindenburg, on behalf of Luis Javier Merino Morán.
Committed on 30/12/2022 at 16:49.
Pushed by hindenburg into branch 'release/22.12'.

url filter: start regex with word boundary

This allows recognizing more kinds of enclosed URIs, e.g:

 - angle brackets: <https://example.com>
 - lenticular brackets: 【https://example.com】
 - braces: {https://example.com}
 - guillemets: «https://example.com»
 - single guillemets ‹https://example.com›
 - ...

and also URIs right after an equals sign, e.g: foo=https://example.com.
Related: bug 462511

M  +6    -0    src/autotests/HotSpotFilterTest.cpp
M  +1    -1    src/filterHotSpots/UrlFilter.cpp

https://invent.kde.org/utilities/konsole/commit/a48e4deab23de1283f89aea49bf8037412daede1