Bug 305725

Summary: Parse URLs with Konversation parser
Product: telepathy Reporter: Jonathan Thomas <echidnaman>
Component: text-ui-message-filtersAssignee: Telepathy Bugs <kde-telepathy-bugs>
Severity: task CC: kde, kde, mklapetek, rohan
Priority: NOR    
Version: git-latest   
Target Milestone: Future   
Platform: Ubuntu Packages   
OS: Linux   
Latest Commit: Version Fixed In:

Description Jonathan Thomas 2012-08-24 15:37:02 UTC
I typed in "xkcd.com/1077" to chat. With the GTalk web client in GMail, that got turned into a link to the relevant xkcd. But in the telepathy chat it didn't turn in to a link.

Reproducible: Always
Comment 1 Martin Klapetek 2012-08-27 08:51:13 UTC
Can you please test with 0.5? We changed how the links are parsed, though I guess the code was just moved around.

Ideally we should be using the Konversation parsing stuff, which is in common-internals, so turning into a task.
Comment 2 Rohan Garg 2012-08-27 08:57:33 UTC
Tested it on 0.5 right now, still shows up as text and not as a link. So still needs fixing. If someone could explain this a bit to me, I could try and fix it.
Comment 3 Martin Klapetek 2012-08-27 09:01:43 UTC
There's a lib/url-filter.cpp in text-ui, which has it's own parsing stuff. Some time ago we adopted Konversation parsing code, which was then moved to common-internals/KTp/text-parser.cpp. 

Using it is something like:

KTp::TextUrlData urls = KTp::TextParser::instance()->extractUrlData(text);

Then look at the TextUrlData structure and just take the corresponding links.
Comment 4 Rohan Garg 2012-08-28 23:41:29 UTC
From a bit of investigating, this still won't work since Konversation's regex can't parse url's like reddit.com or xkcd.com

I've tested this on konversation and after porting text ui to use the Konversation regex matcher.
Comment 5 David Edmundson 2012-08-28 23:52:49 UTC
They're not URLs.

If we made our parser match them, we'll get a bug the next day from someone saying things rendered as links which are not.such as those last two words there, where I deliberately missed a space after the full stop - or every dbus path we type, or any number with a decimal point. €0.10 should not be a link, but maybe should be. maybe.

Before anyone can start fixing this we need a big list of test cases of what does and doesn't render as a link on GTalk, and decide what our intended behaviour should actually be, otherwise you'll just go coding round and round in circles.

It could be google actually look up what is and isn't a domain, maybe it knows top level domains, or maybe it is simple words with dots in them. 

Also if the URL catching is good enough for Konversation (which is a mature project) is it not good enough for us?
Comment 6 Rohan Garg 2012-08-29 10:42:49 UTC
Git commit 0ffe76e12bb7cc6473346d7f0acce69ebd74be2a by Rohan Garg.
Committed on 29/08/2012 at 12:40.
Pushed by garg into branch 'kde-telepathy-0.5'.

Use KTp::TextUrlData to parse URL's instead of custom parsing

REVIEW: 106261

M  +2    -0    lib/CMakeLists.txt
M  +12   -47   lib/url-filter.cpp

Comment 7 Lasath Fernando 2013-03-17 00:11:34 UTC
*** Bug 299329 has been marked as a duplicate of this bug. ***