93851 – spambayes antispam wizard setup

Bug 93851 - spambayes antispam wizard setup

Summary: spambayes antispam wizard setup

Status:	RESOLVED FIXED

Alias:	None

Product:	kmail
Classification:	Applications
Component:	general (show other bugs)
Version:	1.7.1
Platform:	RedHat Enterprise Linux Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	kdepim bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-11-24 15:40 UTC by Anthony Baxter
Modified:	2007-09-14 12:17 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Anthony Baxter 2004-11-24 15:40:16 UTC

Version:           1.7.1 (using KDE KDE 3.3.1)
Installed from:    RedHat RPMs

I've put together screenshots and a short bit of text describing how to configure 
spambayes (spambayes.sf.net) with kmail, at

         http://www.interlink.com.au/anthony/tech/kmail/

spambayes uses statistical methods (originally based on Graham's work, subsequently 
modified based on work by Robinson, Peters and others) to produce a scarily accurate 
antispam filter. 

The SB algorithms were used as a basis for the new antispam filtering in Thunderbird 
(although they only used _part_ of the work, which is why their antispam isn't as
accurate as SB's - oh well).

The primary user-visible feature of SB over most other spam filters is that it's got
a very clear "unsure" rating - when it doesn't know what the message is, it refuses
the temptation to guess - instead messages are marked as 'unsure' and left for the 
user to classify. There's much more on this (including pretty graphs) at the spambayes
website, under 'Background'.

It's pretty easy to configure spambayes with kmail - but it'd be excellent if it was 
included as one of the options in the anti-spam wizard. 

The nice thing about this is that we end up with a really strong antispam filter, nicely
integrated with the email client. This rocks!

Things I can't do with it (these appear to be just not possible with kmail) :

Show the score in the folder view of the spam/unsure folders
Customise the icons used for 'mark as spam/mark as ham' (this makes kmail crash)
Easily "rescore" a folder after training a couple of messages (testing has shown that
training on the minimum number of messages needed produces the best result). 

I'm happy to work with someone who wants to do the work here to make this happen - I haven't even looked at the kmail code, so attempting to do it myself would take far far too much time. If you're interested, drop me an email - anthony@interlink.com.au.

(disclaimer: I'm involved in the spambayes project)

Comment 1 Andreas Gungl 2004-11-24 20:13:13 UTC

Hi Anthony,

if nobody else is going to be faster, I'll add the configuration of spambayes to the wizard's config file on Friday. So it would be included in KDE 3.3.2.

The "unsure" state handling is a bit more work, I think it might be part of KMail in KDE 3.4.

Thanks for pointing us to spambayes and for providing the config details.

Andreas

Comment 2 richardjones 2004-11-25 02:08:47 UTC

I've followed Anthony's instructions for getting spambayes support into kmail, and it rocks. Having it be automatically set up using a wizard would be very cool.

I couldn't follow Anthony's initial step of using the wizard to set up spam_assasin, as I couldn't *find* it.

For the "rescore folder" - wouldn't just select-all and a "rescore" action do the trick?

Comment 3 Anthony Baxter 2004-11-25 03:20:15 UTC

I'm not sure what you mean by delaying the "unsure" state - if it's that you won't be able to create the unsure folder (or another name, if you prefer - e.g. the Outlook spambayes plugin uses "Junk" and "Junk Suspects") then I'd almost say don't do the work at all until then. The unsure category is completely fundamental to how spambayes works - see the discussion on http://spambayes.sourceforge.net/background.html

One of the major flaws in how Thunderbird used the SB algorithms was that they got rid of the unsure window and just made it a binary cutoff. This is bad (see the background page for more).

Comment 4 Andreas Gungl 2004-11-25 09:33:30 UTC

To make it more clear, I can make the wizard to create the needed filters. However the handling in the GUI for setting up that unsure folder or showing a new "unsure" state are changes which are not allowed to be introduced in bugfix branches.
So the next chance to add this is KDE 3.4. Nevertheless you could use spambayes in KDE 3.3.2 to classify your messages and let it explicitly learn from messages known as spam or ham.

Comment 5 Anthony Baxter 2004-11-26 01:39:18 UTC

Fair enough. I guess we should leave this bug open until the unsure/junk suspects and spam/junk folders are created for 3.4. 

As an aside, the spambayes project doesn't recommend auto-trashing email that registers as spam - plenty of people do it, but we don't recommend it. While I can't remember the last false positive I got from spambayes, it could happen.

For 3.3.2, perhaps you could create 2 filters - one for messages that are spam, pointing at whereever you can, and another for unsure, that by default does nothing. This seems the best approach. Then people who want to create an unsure folder can do so. It might also be worth putting some text explaining this in the wizard.

Comment 6 richardjones 2004-11-26 01:49:37 UTC

It would be nice if the "mark unsure as ham" action could then submit the message for re-filtering so the message gets filtered into its appropriate folder (or whatever other action should be taken) ... not sure how the first "score with spambayes" action would handle this though. At the moment, it just gets dumped in the inbox. I've changed the action to mark it as "unread" (rather than "ham", which is useless AFAICT) so at least it's on my radar.

Comment 7 richardjones 2004-11-26 01:55:51 UTC

Duh, I can just add

 X-SpamBayes-Classification
 doesn't match RE
 unsure|spam|ham

to SB filter's conditions so we don't re-invoke the filter on an already handled message.

Comment 8 Andreas Gungl 2004-11-26 22:10:09 UTC

CVS commit by gungl: 

add support for detection of spam messages using the SpamBayes tool,
the anti-spam wizard is considering this tool now
(hint and configuration example by Anthony Baxter)

BUG: 93851


  M +16 -0     kmail.antispamrc   1.13


--- kdepim/kmail/kmail.antispamrc  #1.12:1.13
@@ -65,2 +65,18 @@
 UseRegExp=1
 SupportsBayes=0
+
+[Spamtool #5]
+Ident=spambayes
+Version=1
+VisibleName=S&pamBayes
+Executable=sb_filter.py
+URL=http://spambayes.sourceforge.net
+PipeFilterName=SpamBayes Check
+PipeCmdDetect=sb_filter.py
+ExecCmdSpam=sb_filter.py -s
+ExecCmdHam=sb_filter.py -g
+DetectionHeader=X-Spambayes-Classification
+DetectionPattern=spam
+DetectionOnly=0
+UseRegExp=0
+SupportsBayes=1

Comment 9 Andreas Gungl 2004-11-26 22:14:34 UTC

Just a comment on the commit:
String changes etc. are not allowed in KDE bugfix branches. So changing the config file as shown above is all I can do for 3.3.2 in the moment. Anthony, can you check after applying that change to your kmail.antispamrc file?

Handling of the Unsure state is covered in a more general wish, I've filed #94000 (nice numver, BTW).

Comment 10 Martijn Pieters 2005-08-26 15:18:29 UTC

The detection of the spambayes filter script fails because the current incarnation of the Executable field starts sb_filter as a pipe. If you change it to testing for the help output of the command instead, the detection works fine:

  Executable=sb_filter.py -h

Comment 11 Andreas Gungl 2005-08-26 20:57:16 UTC

Thanks for this hint. I've just fixed this for KDE 3.5.

A hint for the next report: Please don't add comments about problems to closed wishes, but please file a new report. In the case as above, there a report with severity "normal" would have been appropriate.