Bug 358049 - URLs that end in . or ? aren't parsed correctly
Summary: URLs that end in . or ? aren't parsed correctly
Status: RESOLVED WORKSFORME
Alias: None
Product: kmail2
Classification: Applications
Component: general (show other bugs)
Version: 5.1
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-16 02:47 UTC by Unknown
Modified: 2022-11-20 05:12 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Unknown 2016-01-16 02:47:01 UTC
The current implementation doesn't play well with URLs that contain a trailing dot:

https://example.com/users/example.

is wrongly parsed as

https://example.com/users/example

and

https://example.com/users/example?

is, too, wrongly parsed as:

https://example.com/users/example

The relevance of the latter case is debatable, since in URLs, "?" commonly only serves as a delimiter in front of the GET arguments. So, not parsing it shouldn't make a difference to the URL pointed to.

The period, on the other hand, makes a drastic difference, since this is a quite common character allowed in usernames for a lot of websites. Since most RESTful websites have URL schemes such as `/users/${username}`, this completely breaks the ability to freely link to your user page (which is how I found this bug).

Reproducible: Always

Steps to Reproduce:
1. Write a mail that contains URLs that end in "." or "?"
2. View them in the message viewer

Actual Results:  
The characters "." and "?" aren't parsed, although they are valid URLs.

Expected Results:  
The parser should match these characters, too.
Comment 1 Laurent Montel 2016-03-07 06:11:55 UTC
I think that your url is "example.com?ff=bla" etc ?
not just "example.com?" ?
Comment 2 Unknown 2016-03-07 12:42:03 UTC
No; then the URI wouldn't *end* in the question mark.
Comment 3 Erik Quaeghebeur 2016-05-23 08:56:07 UTC
(In reply to kdex from comment #2)
> No; then the URI wouldn't *end* in the question mark.

Clearly, the parsing code needs to take into account both what valid URLs are *and* what people type in mails. When a URL comes at the end of a sentence, people write punctuation after it, such as a period and a question mark. So if nothing comes after that, i.e., there is white space or a line break, the most likely situation is that those symbols are punctuation and not part of the URL, although they could be. So the question is, do those user names in URLs you mention end with a period?

I tested the following (in 4.14.10):

http://example.com/whatever.
http://example.com/whatever. you
http://example.com/whatever.you

http://example.com/whatever?
http://example.com/whatever? q
http://example.com/whatever?q

In both cases, the first two links in KMail end before the punctuation and the last one includes the punctuation and what follows. (So, FWIW, I would say this issue can be CONFIRMED.) This seems a reasonable choice to me. I guess there need to be a sufficient number of URLs in the wild ending with punctuation symbols for it to counteract the argument that these symbols are usually not part of URLs in email messages.

I suggest closing as WONTFIX based on that argument.
Comment 4 Unknown 2016-05-23 12:59:59 UTC
I don't see a clear benefit from teaching people to misunderstand URLs. Suggesting that they might not end in a dot will naturally break lots of valid URLs, and the same goes for question marks (and potentially other tokens that I haven't checked).

Also, expecting that users will put punctuation symbols after their URLs to end a sentence is a constructed heuristic; the majority of my sent and received mails actually contain footnotes such as "[1]", which will be completed with their URLs at the bottom of the mail, line by line. This format is also very wide-spread and suffers from KMail's heuristic.

Next, observe that this is not just about usernames that might end in punctuation. See [1] to agree that a dot at the end of a domain is, too, valid syntax and should be parsed to reflect that. In the case of usernames or other dynamic parts in a URL, KMail *will* break websites (like [2], which allows trailing dots in usernames and thus, in their profile page URLs).

[1] https://webmasters.stackexchange.com/questions/73934/how-can-urls-have-a-dot-at-the-end-e-g-www-bla-de/73937
[2] https://www.npmjs.com
Comment 5 Erik Quaeghebeur 2016-05-23 20:21:40 UTC
(In reply to kdex from comment #4)

Thanks for your reply.

> I don't see a clear benefit from teaching people to misunderstand URLs.
> Suggesting that they might not end in a dot will naturally break lots of
> valid URLs, and the same goes for question marks (and potentially other
> tokens that I haven't checked).

Well, that is an URL entry issue. KMail could suggest people to encapsulate URLs it detects in mails about to be sent out using <...>.

> Also, expecting that users will put punctuation symbols after their URLs to
> end a sentence is a constructed heuristic; the majority of my sent and
> received mails actually contain footnotes such as "[1]", which will be
> completed with their URLs at the bottom of the mail, line by line. This
> format is also very wide-spread and suffers from KMail's heuristic.

Indeed this format is common. URLs ending with periods or question marks less so, and that is where it breaks. Nevertheless a valid point.

> Next, observe that this is not just about usernames that might end in
> punctuation. See [1] to agree that a dot at the end of a domain is, too,
> valid syntax and should be parsed to reflect that.

I would need to be convinced that this is a practical issue. I haven't come across mails with domains in this format. I have come across plenty of mails with periods as punctuation at the end of URLs, i.e., that won't resolve if the period is parsed as part of the URL.

> In the case of usernames
> or other dynamic parts in a URL, KMail *will* break websites (like [2],
> which allows trailing dots in usernames and thus, in their profile page
> URLs).

Indeed; that is problematic. You've convinced me that this is a real issue. Nevertheless, always parsing periods as part of the URL will also break links. The KMail devs will have to decide what is most important.
Comment 6 Justin Zobel 2022-10-21 00:17:44 UTC
Thank you for reporting this bug in KDE software. As it has been a while since this issue was reported, can we please ask you to see if you can reproduce the issue with a recent software version?

If you can reproduce the issue, please change the status to "CONFIRMED" when replying. Thank you!
Comment 7 Bug Janitor Service 2022-11-05 05:07:54 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 8 Bug Janitor Service 2022-11-20 05:12:14 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!