Bug 133995

Summary: [patch] greedy matching for emoticons
Product: [Unmaintained] kopete Reporter: Martin Matusiak <juventuz2000>
Component: generalAssignee: Kopete Developers <kopete-bugs-null>
Status: RESOLVED FIXED    
Severity: wishlist    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: greedy_emoticon_parsing.patch

Description Martin Matusiak 2006-09-13 09:01:11 UTC
Version:           0.12.2 (using KDE Devel)
Installed from:    Compiled sources
Compiler:          gcc 4.1.1 
OS:                Linux

Kopete's emoticon parsing is not greedy. This means that :s will be parsed over :strong:, which is not the behavior that users expect. I discovered this problem when testing this emoticon theme: http://www.kde-look.org/content/show.php?content=43094 (which incidentally is good for testing, because of the number of emoticons and emoticon strings it contains).


Now with regards to implementation, this is how Kopete parses emoticons.. In kopete/libkopete/private/kopeteemoticons.cpp, in the function tokenize(), it reads the message letter by letter and tries to match on emoticons. If it matches the first letter of any emoticon, it then traverses a list of emoticons (which matched the first letter) and simply picks the first one that matches. So if :s is listed before :strong:, :s will be the emoticon displayed and that's it.

There are two fairly obvious ways to fix this. One is to hack tokenize() to find more than one match, and then to use the longest one. And this would complicate tokenize() somewhat. So another solution, possibly simpler, is to streamline the data (the list of emoticons) so that they are sorted from longest to shortest. This will ensure that matches are always greedy. My patch does exactly this.
Comment 1 Martin Matusiak 2006-09-13 09:02:01 UTC
Created attachment 17748 [details]
greedy_emoticon_parsing.patch
Comment 2 Olivier Goffart 2006-09-14 07:23:26 UTC
thanks for your patch, it looks fine.
I hope I will have the time to test it this week.
Comment 3 Martin Matusiak 2006-09-14 15:05:08 UTC
I've described the problem in a more user centric way on my blog (includes screenshots): http://www.matusiak.eu/numerodix/blog/index.php/2006/09/13/fixing-lack-of-greedy-emoticon-matching-in-kopete
Comment 4 Olivier Goffart 2006-09-15 22:14:05 UTC
SVN commit 584918 by ogoffart:

Fix emoticon parsing if there are sub-emoticons.
BUG: 133995
Patch by Martin Matusiak, thanks.



 M  +18 -0     kopeteemoticons.cpp  
 M  +6 -0      kopeteemoticons.h  


--- branches/KDE/3.5/kdenetwork/kopete/libkopete/private/kopeteemoticons.cpp #584917:584918
@@ -48,6 +48,8 @@
 struct Emoticons::Emoticon
 {
 	Emoticon(){}
+	/* sort by longest to shortest matchText */
+	bool operator< (const Emoticon &e){ return matchText.length() > e.matchText.length(); }
 	QString matchText;
 	QString matchTextEscaped;
 	QString	picPath;
@@ -424,6 +426,7 @@
 		node = node.nextSibling();
 	}
 	mapFile.close();
+	sortEmoticons();
 }
 
 
@@ -492,12 +495,27 @@
 		node = node.nextSibling();
 	}
 	mapFile.close();
+	sortEmoticons();
 }
 
 
+void Emoticons::sortEmoticons()
+{
+	/* sort strings in order of longest to shortest to provide convenient input for
+		greedy matching in the tokenizer */
+	QValueList<QChar> keys = d->emoticonMap.keys();
+	for ( QValueList<QChar>::const_iterator it = keys.begin(); it != keys.end(); ++it )
+	{
+		QChar key = (*it);
+		QValueList<Emoticon> keyValues = d->emoticonMap[key];
+ 		qHeapSort(keyValues.begin(), keyValues.end());
+ 		d->emoticonMap[key] = keyValues;
+	}
+}
 
 
 
+
 QMap<QString, QString> Emoticons::emoticonAndPicList()
 {
 	return d->emoticonAndPicList;
--- branches/KDE/3.5/kdenetwork/kopete/libkopete/private/kopeteemoticons.h #584917:584918
@@ -156,6 +156,12 @@
 	 * @see initEmoticons
 	 */
 	void initEmoticon_JEP0038( const QString & filename);
+	
+	/**
+	 * sorts emoticons for convenient parsing, which yields greedy matching on
+	 * matchText
+	 */
+	void sortEmoticons();
 
 
 	struct Emoticon;