Bug 65909 - HTML messages have link tags added around URLs inside HTML tags
Summary: HTML messages have link tags added around URLs inside HTML tags
Status: RESOLVED FIXED
Alias: None
Product: kopete
Classification: Applications
Component: libkopete (show other bugs)
Version: 0.8.0
Platform: unspecified Linux
: NOR minor
Target Milestone: ---
Assignee: Kopete Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-10-12 08:54 UTC by Sean Lynch
Modified: 2004-02-15 18:53 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Notice double link (48.18 KB, image/jpeg)
2003-10-12 08:55 UTC, Sean Lynch
Details
Patch: don't parse links on HTML messages. (626 bytes, patch)
2004-02-12 23:40 UTC, Richard Smith
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sean Lynch 2003-10-12 08:54:31 UTC
Version:           unknown (using KDE 3.1.92 (CVS >= 20031007), compiled sources)
Compiler:          gcc version 3.3 20030226 (prerelease) (SuSE Linux)
OS:          Linux (i686) release 2.6.0-test6

When a buddy sends me a URL, kopete displays it twice, with only the second instance being clickable (see attachment for better visual explanation).  I checked out this behaviour with icq (being oscar as well), but it does not do this, although it does not link a trailing / if there is one in the send url (example: http://www.thinkgeek.com/gadgets/lights/600c/, only http://www.thinkgeek.com/gadgets/lights/600c is highlighted and linked).
Comment 1 Sean Lynch 2003-10-12 08:55:26 UTC
Created attachment 2754 [details]
Notice double link
Comment 2 Casey Allen Shobe 2003-10-12 09:01:33 UTC
Confirmed.  I see the same thing. 
Comment 3 Olivier Goffart 2003-10-12 09:44:01 UTC
looks like the message is parsed two times.   
 
we have to do some work at this level. 
 
Comment 4 Stefan Gehn 2003-10-19 18:48:41 UTC
Please post the debug output when you run kopete from a terminal, it's probably a bug with parsing the ugly and broken AIM html code. I'd also appreciate testing with different clients sending the same url so differences will be easier to spot.
Comment 5 Sean Lynch 2003-10-20 02:12:13 UTC
aim client: 
kopete (oscar): [void OscarSocket::parseMessage(const UserInfo&, const QString&, unsigned char, unsigned char)] Got a normal message: <HTML><BODY BGCOLOR="#ffffff"><FONT LANG="0"><A HREF="http://www.google.com">www.google.com</A></FONT></BODY></HTML>

trillian:
kopete (oscar): [void OscarSocket::parseMessage(const UserInfo&, const QString&, unsigned char, unsigned char)] Got a normal me
ssage: <HTML><BODY BGCOLOR="#ffffff"><font face="Arial">www.google.com</BODY></HTML>

It seems with the aim client that it makes the link the right way (wrapping text with anchor tags), but also seems that kopete parses any www.something address as a link as well.  BTW, the trillian link of www.google.com looked and worked right, but the aim client with www.google.com came as...

http://www.google.com">www.google.com  (with the http://www.google.com and www.google.com both blue as links, but not the ">.

Anymore help I can provide, let me know.
Comment 6 Jason Keirstead 2003-10-20 04:36:06 UTC
Kopete does indeed turn URLs clickable.  But it only does this for messages that came across the wire as plainText.

It seems to me from the debug output the problem is that the OSCAR plugin is sending the <a href=""></a> to KopeteMessage and yet still marking the message type as plainText, when it should either be stripping it out, or marking the type as HTML.

Comment 7 Matt Rogers 2003-11-06 07:18:52 UTC
This would be a libkopete issue now. I hacked a little bit on this tonight and found out that even if I pass valid HTML, the double link still gets displayed. :/ 
Comment 8 Jason Keirstead 2003-11-06 13:29:38 UTC
Problem is in KopeteMessage. parseLinks is replacing links regardless of if type is HTML or PlainText.

If the type is HTML, parseLinks needs to add another check to its regexp to ensure that it is not replacing a link that was already "linked" so to speak. Either that or not run parseLinks on HTML type messages at all.

A work around could also be to set the message with the type ParsedHTML but then you wouldn't get emotiocons.
Comment 9 Richard Smith 2004-02-12 23:40:58 UTC
Created attachment 4662 [details]
Patch: don't parse links on HTML messages.

I can't reproduce this. Does the attached patch help?

Apply in libkopete/ with:
patch < kopetemessage-nolinksonhtml.diff
Comment 10 Martijn Klingens 2004-02-15 13:45:19 UTC
According to Stefan the patch is wrong. I tried to create another one based on his input two days ago, but didn't really succeed.

Adding 0.8 to the version since AFAICS the fix will be backportable.
Comment 11 Richard Smith 2004-02-15 17:37:45 UTC
I have a plan. I'll be back with a result soon...
Comment 12 Richard Smith 2004-02-15 18:47:41 UTC
CVS commit by lilachaze: 

Don't parse links inside HTML tags.
CCMAIL: 65909@bugs.kde.org


  A            tests/kopetelinktest.cpp   1.1 [GPL (v2+)]
  M +34 -12    kopetemessage.cpp   1.140
  M +1 -1      kopetemessage.h   1.55
  M +1 -0      tests/.cvsignore   1.2.4.2
  M +5 -1      tests/Makefile.am   1.1.4.3


--- kdenetwork/kopete/libkopete/kopetemessage.cpp  #1.139:1.140
@@ -476,22 +476,44 @@ QString KopeteMessage::parsedBody() cons
 #endif
         }
-        /*
-        // FIXME: Should be uncommented as soon as ICQ and other plugins are free of stuff like:
-        // text=i18n("<b>[Email Message:]</b> %1").arg(text);
-        // where text is not yet parsed
-        else if( d->format & RichText )
-        {
-                // Richtext should already have <a href ...> around URLs, so no link parsing takes place
-                return KopeteEmoticons::parseEmoticons(escapedBody());
-        }
-        */
         else
         {
-                return KopeteEmoticons::parseEmoticons(parseLinks(escapedBody()));
+                return KopeteEmoticons::parseEmoticons(parseLinks(escapedBody(), d->format));
         }
 }
 
-QString KopeteMessage::parseLinks( const QString &message ) const
+QString KopeteMessage::parseLinks( const QString &message, MessageFormat format )
 {
+        if ( format == ParsedHTML )
+                return message;
+
+        if ( format & RichText )
+        {
+                // < in HTML *always* means start-of-tag
+                QStringList entries = QStringList::split( QChar('<'), message, true );
+
+                QStringList::Iterator it = entries.begin();
+                
+                // first one is different: it doesn't start with an HTML tag.
+                if ( it != entries.end() )
+                {
+                        *it = parseLinks( *it, PlainText );
+                        ++it;
+                }
+
+                for ( ; it != entries.end(); ++it )
+                {
+                        QString curr = *it;
+                        // > in HTML means start-of-tag if and only if it's the first one after a <
+                        int tagclose = curr.find( QChar('>') );
+                        // no >: the HTML is broken, but we can cope
+                        if ( tagclose == -1 )
+                                continue;
+                        QString tag = curr.left( tagclose + 1 );
+                        QString body = curr.mid( tagclose + 1 );
+                        *it = tag + parseLinks( body, PlainText );
+                }  
+                return entries.join(QString::fromLatin1("<"));
+        }
+
         QString result = message;
 

--- kdenetwork/kopete/libkopete/kopetemessage.h  #1.54:1.55
@@ -342,5 +342,5 @@ private:
         KopeteMessagePrivate *d;
 
-        QString parseLinks( const QString &message ) const;
+        static QString parseLinks( const QString &message, MessageFormat format );
 };
 

--- kdenetwork/kopete/libkopete/tests/.cvsignore  #1.2.4.1:1.2.4.2
@@ -4,2 +4,3 @@
 kopetepasswordtest
 kopetewallettest
+kopetelinktest

--- kdenetwork/kopete/libkopete/tests/Makefile.am  #1.1.4.2:1.1.4.3
@@ -7,5 +7,5 @@
 METASOURCES = AUTO
 
-check_PROGRAMS = kopeteemoticontest kopetewallettest kopetepasswordtest
+check_PROGRAMS = kopeteemoticontest kopetewallettest kopetepasswordtest kopetelinktest
 
 kopeteemoticontest_SOURCES = kopeteemoticontest.cpp
@@ -20,2 +20,6 @@
 kopetepasswordtest_LDFLAGS = -no-undefined $(all_libraries) $(KDE_RPATH)
 kopetepasswordtest_LDADD = ../libkopete.la
+
+kopetelinktest_SOURCES = kopetelinktest.cpp
+kopetelinktest_LDFLAGS = -no-undefined $(all_libraries) $(KDE_RPATH)
+kopetelinktest_LDADD = ../libkopete.la


Comment 13 Richard Smith 2004-02-15 18:53:14 UTC
CVS commit by lilachaze: 

Don't parse links inside HTML tags.
CCMAIL: 65909-done@bugs.kde.org


  M +34 -2     kopetemessage.cpp   1.134.2.1
  M +1 -1      kopetemessage.h   1.54.2.1


--- kdenetwork/kopete/libkopete/kopetemessage.cpp  #1.134:1.134.2.1
@@ -476,10 +476,42 @@ QString KopeteMessage::parsedBody() cons
         else
         {
-                return KopeteEmoticons::parseEmoticons(parseLinks(escapedBody()));
+                return KopeteEmoticons::parseEmoticons(parseLinks(escapedBody(), d->format));
         }
 }
 
-QString KopeteMessage::parseLinks( const QString &message ) const
+QString KopeteMessage::parseLinks( const QString &message, MessageFormat format )
 {
+        if ( format == ParsedHTML )
+                return message;
+
+        if ( format & RichText )
+        {
+                // < in HTML *always* means start-of-tag
+                QStringList entries = QStringList::split( QChar('<'), message, true );
+
+                QStringList::Iterator it = entries.begin();
+                
+                // first one is different: it doesn't start with an HTML tag.
+                if ( it != entries.end() )
+                {
+                        *it = parseLinks( *it, PlainText );
+                        ++it;
+                }
+
+                for ( ; it != entries.end(); ++it )
+                {
+                        QString curr = *it;
+                        // > in HTML means start-of-tag if and only if it's the first one after a <
+                        int tagclose = curr.find( QChar('>') );
+                        // no >: the HTML is broken, but we can cope
+                        if ( tagclose == -1 )
+                                continue;
+                        QString tag = curr.left( tagclose + 1 );
+                        QString body = curr.mid( tagclose + 1 );
+                        *it = tag + parseLinks( body, PlainText );
+                }  
+                return entries.join(QString::fromLatin1("<"));
+        }
+
         QString result = message;
 

--- kdenetwork/kopete/libkopete/kopetemessage.h  #1.54:1.54.2.1
@@ -342,5 +342,5 @@ private:
         KopeteMessagePrivate *d;
 
-        QString parseLinks( const QString &message ) const;
+        static QString parseLinks( const QString &message, MessageFormat format );
 };