46826 – Bayesian spam filter feature

Bug 46826 - Bayesian spam filter feature

Summary: Bayesian spam filter feature

Status:	RESOLVED FIXED

Alias:	None

Product:	kmail
Classification:	Applications
Component:	filtering (show other bugs)
Version:	1.4.1
Platform:	Compiled Sources Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	kdepim bugs

URL:
Keywords:

Duplicates (2):	34741 47138 (view as bug list)
Depends on:
Blocks:

Reported:	2002-08-23 09:03 UTC by Ferdinand Gassauer
Modified:	2007-09-14 12:17 UTC (History)
CC List:	6 users (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Ferdinand Gassauer 2002-08-23 08:53:18 UTC

(*** This bug was imported into bugs.kde.org ***)

Package:           kmail
Version:           1.4.1 (using KDE 3.0.3 )
Severity:          wishlist
Installed from:    compiled sources
Compiler:          gcc version 2.95.2 19991024 (release)
OS:                Linux (i686) release 2.2.19
OS/Compiler notes: 

Hi!
IMHO this filter extension would be a killer app
see
http://www.paulgraham.com/spam.html
cu
ferdinand

(Submitted via bugs.kde.org)
(Called from KBugReport dialog)

Comment 1 Ferdinand Gassauer 2002-08-28 17:02:55 UTC

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

thinking loud -=20
would it also be possible to set up "all" filter rules on such a system?
if no explicit rule is given check all folders and classify the mail=20
according to the folders content
if this works with some 99+% reliablility .....
- --=20
Best regards
Ferdinand Gassauer
http://www.goesing.at
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9bQI/Wjd2zH2e+LERAg5TAJ9IPcPKKyFZO7zv4PVOFnRyUZLBKwCfSpAn
3kHvctUSVmF1iUrbqkmGnTU=3D
=3DBWFT
-----END PGP SIGNATURE-----

Comment 2 Ferdinand Gassauer 2002-09-09 21:43:19 UTC

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!
after using Outlook and kmail's filtering rules for a long period I am not=
=20
very enthusiastic about the way filters can be set up. (not because of the=
=20
filter but because of the limited analytical capability of the human brain)

1)I doubt that more than 10 filter rules can be set up and sorted easily.
2)I do not set up filter rules for folders with low trafic but I would be=
=20
pleased if mails end up there automagicaly.
3)I am perfect in setting up filters in a sequence that messages end up in =
the=20
wrong folder.=20

compered to that

filtering with Bayes Method:
Priority of filters
1) high priority manual filters
2) Spam filtering
the mail is checked against the wordlist of all positive folders
3) low priority manual filters
4) Bayes filtering for automatic folder allocation
the mail is checked against the wordlist of each folder against the word=
=20
list of all other good folders
5) folder filters

Filter rules
1) manualy defined - as it is now
2) Bayesian rules -=20
2.1) Spam - propability to accept spam mail ~1%
2.2) other folders - probability to allocate a wrong mail in this folder ~1%
2.3) filter rules for each folder=20


How does it work?

If Bayes-filtering is enabled then the apropriate wordlist is filled if a m=
ail=20
is moved from one folder to another. (eventualy using the MMB for this=20
special=20
purpose).=20

Words in the word list are not deleted on deletion of a mail to avoid loss=
=20
of "experiance"

The word list for each flder can be edited manualy.

IMHO after moving 10-20 mails to a folder the propability of a mismatch=20
converges rapidly to 1% or less.

Massages which are not allocated according to the Bayes rules stay in the=
=20
default folder for the checked account.

If a folder is selected according to the Bayes-rules then sub rules (2.3)
specifyed for this case are examined.

hope that this gives some input

- --=20
cu
ferdinand
http://www.goesing.at
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9fRX6Wjd2zH2e+LERAjI7AKDMBhKpBVxEBPTwqMq6BDN35e0jGQCeP8Hs
gyUvE2bSIXx6kO3F7WDgP2s=3D
=3DwsHf
-----END PGP SIGNATURE-----

Comment 3 Richard Jones 2002-09-30 06:11:55 UTC

 I'd love to see a bayesian filter built into KMail. See 
http://bugzilla.mozilla.org/show_bug.cgi?id=163188 for the discussion about 
integrating it with the Mozilla mail client. At a minimum, there's ideas to 
share - there may very well be code too. 
 
Just a datapoint: there's significant research being done into the Bayesian 
filtering at the http://spambayes.sf.net/ project. It's being done in Python, 
so it's practically psuedocode that could be extrapolated into C++. The 
research is very much still active, and if you're into statistics, it's the 
place for you :) 
 
I'll be using the results of their work whether it's integrated into KMail or 
not - I currently use spamassassin, but sucks in a number of ways. Better than 
nothing though.

Comment 4 Ferdinand Gassauer 2002-11-16 22:18:31 UTC

see http://popfile.sourceforge.net/ 
BTW I did not get it to work until now - but i didn't try hard

Comment 5 Ismail Donmez 2003-03-18 14:06:12 UTC

Spamassassin 2.50 has builtin bayesian spam filtering now...

Comment 6 Marc Mutz 2003-07-20 10:41:39 UTC

You know that all KMail can uuse _any_ sppam filter that is at least a bit sane and allows mails to 
be piped through it? 
 
Just set up two filters: 
1. 
Name "Pipe through SPAM filter" 
If <size> <less than> <50000> 
Then <pipe through> </path/to/your/spamfilter> 
2. (immediately following) 
Name "Filter out SPAM" 
If <X-Spam-Flag> <contains> <yes> 
or <X-Spam-Level> <contains> <*****> 
Then <move to folder> <SPAM> 
 
In KMail from HEAD CVS, you can even define a filter 
Name "report as SPAM" 
If <nothing> 
Then <execute command> </usr/bin/spamassassin -r> 
Never apply, but 
[x] Create menu entry for this filter action 
 
This gives you the full flexibility of KMail filters, combined with the power of e.g. SpamAssassin. 
You can e.g. make spam filtering more efficient by re-using the SpamAssassin results that 
mailing lists perform, by adding 
  And <X-Spam-Flag> <doesn't matche regexp> <.> 
to the first filter.

Comment 7 Marc Mutz 2003-07-20 10:50:10 UTC

*** Bug 47138 has been marked as a duplicate of this bug. ***

Comment 8 tnagy 2003-07-20 12:49:33 UTC

More flexibility ? 
 
I do not want to a spamassassin server running on my box. I do not want to
install extra things just to make kmail work properly. I just want a default
spam filter that can be run easily. 

Your solution is far too complicated. 
 
Like many end-users, I will stick to Mozilla until KMail has a similar feature.

Comment 9 Anders Lund 2003-07-20 12:53:07 UTC

Mh, I am now using spamassasssin with bayesian filtering enabled, and i am testing qsf as 
well. I tried removing my votes (bugzilla wouldn't allow me to change it to "0"), since i 
consider this solved, esp with the option to apply filter actions.

Comment 10 Anders Lund 2003-07-20 12:58:08 UTC

Subject: Re:  Bayesian spam filter feature

On Sunday 20 July 2003 12:49, tnagy wrote:
> I do not want to a spamassassin server running on my box. I do not want to
>
> install extra things just to make kmail work properly. I just want a
> default
>
> spam filter that can be run easily.
>
>
> Your solution is far too complicated.
>

Comment 11 lakeland 2003-07-20 22:53:04 UTC

Subject: Re:  Bayesian spam filter feature

> I do not want to a spamassassin server running on my box. I do not want to
> install extra things just to make kmail work properly. I just want a
> default spam filter that can be run easily.

Is installing kmail and bogofilter really more complex than installing mozilla 
and its filters?

> Like many end-users, I will stick to Mozilla until KMail has a similar
> feature.

Your call.  But the difference in complexity between calling bogofilter from 
kmail verusus running the bayesian filter inside moz seems pretty academic to 
me.  Perhaps it could do with being more transparent, but then that reduces 
flexibility (do users want bogofilter, or spamprobe, or spamassassin, or 
crm114, or a mail accountant?).  

- From my perspective, a few features could be added, such as sending tickets if 
the message is non-spam but not for spam, and filters on moving to certain 
folders.  Also a few things made somewhat simpler (perhaps a spam setup 
wizard), but by and large it is complete now.  Of course, Marc's opinion 
counts for more than mine :-)

Corrin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/GwJDi5A0ZsG8x8cRAl/HAJ4r+dbd+aBOO48gmGylc9eNAnRu0ACeJqmJ
r9XsdBELRuFM401+zgJGA2I=
=vRi5
-----END PGP SIGNATURE-----

Comment 12 lakeland 2003-07-20 22:58:56 UTC

Subject: Re:  Bayesian spam filter feature

On Sunday 20 July 2003 22:53, you wrote:

> I tried removing my votes (bugzilla
> wouldn't allow me to change it to "0"), since i consider this solved, esp
> with the option to apply filter actions.

Agreed.  Marc, what do you think of closing the bug?

Corrin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/GwOpi5A0ZsG8x8cRAoOHAKCVoC/cJ7g3TbqJz497bcBNrxkpLQCgkzAP
eF9A3B2msq/+JCAYbdLbdvw=
=X0Rb
-----END PGP SIGNATURE-----

Comment 13 tnagy 2003-07-21 01:07:20 UTC

> ------- Additional Comments From lakeland@acm.org  2003-07-20 22:53 -------
> Subject: Re:  Bayesian spam filter feature
> 
> > I do not want to a spamassassin server running on my box. I do not want to
> > install extra things just to make kmail work properly. I just want a
> > default spam filter that can be run easily.
> 
> Is installing kmail and bogofilter really more complex than installing mozilla 
> and its filters?
> 

Yes it is. You have to know that bogofilter is something that can
plug-in into kmail (I've just learned something in fact), you have to search a
bogofilter package for your distro/system and then install bogofilter. Yeah,
very simple.

My problem is that I don't want to install another program just to make
kmail work, I don't have the root password on my workstation, and I
simply don't have the time : searching for the documentation, reading
the right documentation, and set up everything : half a day lost.


> > Like many end-users, I will stick to Mozilla until KMail has a similar
> > feature.
> 
> Your call.  But the difference in complexity between calling bogofilter from 
> kmail verusus running the bayesian filter inside moz seems pretty academic to 
> me.  Perhaps it could do with being more transparent, but then that reduces 
> flexibility (do users want bogofilter, or spamprobe, or spamassassin, or 
> crm114, or a mail accountant?).  
> 


It depends on who you think your users are. Basic users do not want to
know anything about bogofilter, spamprobe, spamassassin or crm114. I am sure my
girlfriend will be happy to know more about all these anti-spam tools and test
them ;-) 
Seriously, what is probably wanted by common users is just a simple simple spam
filter that comes by default and moves the most annoying messages like "VIAGRA"
"PENIS ENLARGEMENT" "COME TO NIGERIA" in a directory entitled "spam_is_here".


Isn't that possible to have a default very basic Bayesian built-in
filter to put all that penis stuff in such a subdirectory (I mean a
default behaviour, not an option setup by a complicated wizard), and
support for the external spam filters at the same time (for advanced
users) ?


--
TN

Comment 14 automailer 2003-07-21 08:17:55 UTC

I second TN. Also it is very nice for power users to tweak everything be 
themselves, this is inacceptable if KMail wants to be taken serious as an end 
users package. 
 
Reading the above discussion and from experience the most end users want to 
know about SPAM protection is: How do I toggle the SPAM status (read: I want 
to click on button to change it)? Where does all this filthy stuff go, before 
it goes to digital nirvana? 
 
However duplicating all the detection code is not really open source spirit. 
So what about choosing one SPAM filter, make it an optional package and build 
some simple UI in KMail to control the external SPAM filter.- Which should be 
easy to tweak for power users to build in different solutions. 
 
The SPAM button would have the following functionality: 
 
Toggled Checked: Report as false-negative 
Toggled Unchecked: Report as false-positive 
 
After any action on this button the filters need to be re-run to take 
appropriate action for the new message status. 
 
What else is required? Maybe a wizzard for end users in order to set the 
folder where SPAM should go. 
 
Cheers, 
Thorsten

Comment 15 Ingo Klöcker 2003-07-21 11:23:49 UTC

My opinion about this is the following: 
- I agree that we need to improve documentation about how to use a spam filter with KMail. 
- I agree that we need to make using spam filters with KMail easier, e. g. by adding predefined 
profiles for various spam filters and by detecting certain spam filters that are installed. 
 
But: 
- We won't distribute a particular spam filter with KMail. End users pay distributors for doing 
this. And anyone who installs KDE from RPMs will also be able to install an additional spam 
filter RPM. 
- Frontends for configuring spam filters should be written as KDE Control Center modules and 
could then easily be plugged into KMail's configuration dialog. We won't include them directly 
into KMail's code because this would make updating the configuration modules pretty much 
impossible in case a newer version of a spam filter needs a newer version of the configuration 
module.

Comment 16 Brad Hards 2003-07-21 14:19:54 UTC

I run spamassassin (the perl script, not the standalone server). I love it. Works 
fantastic. Ties into kmail really well. 
 
But maybe it needs to be even easier. Unless you implicitly understand the concept 
of unix style "piping", the setup is not obvious. 
 
Perhaps one solution would be to have the filter rules built into kmail. I don't think I'd 
want the filter built in (because on a stable system, I'd like to upgrade the spam filter 
a lot, but keep kmail stable; while on my devel system, I'd like to run CVS HEAD 
kmail, but keep my well trained spamassassin setup constant). 
So when you go to Settings->Configure Filters..., you get a wizard (or whatever) that 
identifies that spamassasin is installed and provides "one click" setup of the filter 
rules.

Comment 17 Casey Allen Shobe 2003-07-23 23:50:37 UTC

Reply to comment #2, in regards to filter organization. 
 
I was just thinking about this as I've got quite some number of filters myself (around 50).  It's 
very helpful to be able to reorder the filters, but why not have filter groups?  I commonly rename 
my filters to things like 'Mailing Lists - Rdesktop - Users', 'Livejournal - Administrative', 
'Livejournal - Notifications', 'Spam - From Subjects 1 - Targeted', 'Spam - From Subjects 2 - Save 
Money', etc. and then lump up a whole bunch of things into one filter with a 'match any of the 
following'. 
 
It would be *very* nice if I could organize these into groups, or folders, or directories, or 
whatever name you happen to like.  So maybe a second listbox on the far left (by the way, the 
listbox(es) need(s) to be resizable without resizing the whole window...) would contain a nested 
folders view...i.e. the following: 
 
Filters 
+-Spam 
| +- by Subject 
| | +- Crap to Buy 
| | +- Save Money 
| | +- Targeted 
| +- by Body content 
| | +- Crap to Buy 
+- Mailing Lists 
  +- KDE 
  | +- Users 
  + rdesktop 
    +- Users 
 
And then within each folder, I could put all of the individual things... 
 
It would also be very helpful to be able to copy existing filters. 
 
It would also be very helpful if I could use something more complex than 'all of the following' or 
'any of the following'...but I think that having folders and the ability to copy filters would 
eliminate that desire.

Comment 18 Casey Allen Shobe 2003-07-23 23:52:27 UTC

Bleah, bugzilla stripped out my leading spaces in the diagram, but you ought to be able to figure 
it out still.

Comment 19 Russell Miller 2003-07-24 00:40:58 UTC

Subject: Re:  Bayesian spam filter feature

On Wed, Jul 23, 2003 at 09:50:40PM -0000, Casey Allen Shobe wrote:
> 
> ------- Additional Comments From cshobe@somerandomdomain.com  2003-07-23 23:50 -------
> Reply to comment #2, in regards to filter organization. 
>  
I believe that the order of the filters serves some purpose.  IIRC, the ones
at the beginning (end) of the list get executed first, and it goes up the
chain until it finds one that stops processing there.

How would your idea of filter organization (which I find to be decent on its
face other than this rather large issue) deal with that?

--Russell

Comment 20 lakeland 2003-07-24 05:42:55 UTC

Subject: Re:  Bayesian spam filter feature

On Mon, 21 Jul 2003 11:07, you wrote:

> > > I do not want to a spamassassin server running on my box. I do not want
> > > to install extra things just to make kmail work properly. I just want a
> > > default spam filter that can be run easily.
> >
> > Is installing kmail and bogofilter really more complex than installing
> > mozilla and its filters?
>
> Yes it is. You have to know that bogofilter is something that can
> plug-in into kmail (I've just learned something in fact), you have to
> search a bogofilter package for your distro/system and then install
> bogofilter. Yeah, very simple.

% apt-cache search spam | grep -i filter
amavisd-new - Interface between MTA and virus scanner/content filters
blackhole-exim - Spam filter - exim version
bogofilter - a fast Bayesian spam filter
crm114 - The Controllable Regex Mutilator and Spam Filter
ifile - Mail filter capable of learning
mailfilter - A program that filters your incoming e-mail to help remove spam.
pyzor - spam-catcher using a collaborative filtering network
razor - spam-catcher using a collaborative filtering network
spamassassin - Perl-based spam filter using text analysis
spamc - Client for perl-based spam filtering daemon
spamfilter - Filter spam from incoming mail
spamoracle - A statistical analysis spam filter based on Bayes' formula
spamoracle-byte - A statistical analysis spam filter based on Bayes' formula
spamprobe - a C++ Bayesian spam filter
blackhole-qmail - Spam filter - qmail version
qmail-qfilter - qmail-queue filter front end

% apt-get install bogofilter

Perhaps two minutes' work?

> My problem is that I don't want to install another program just to make
> kmail work, I don't have the root password on my workstation, and I
> simply don't have the time : searching for the documentation, reading
> the right documentation, and set up everything : half a day lost.

Then get your sysadmin to install a spam filter.  It just seems this problem 
is not related to kmail, but to the ease of installing software.

> It depends on who you think your users are. Basic users do not want to
> know anything about bogofilter, spamprobe, spamassassin or crm114. I am
> sure my girlfriend will be happy to know more about all these anti-spam
> tools and test them ;-)

*shrug*, then have a default (spamassassin I guess, since it doesn't require 
configuration, training, or correcting).

> Seriously, what is probably wanted by common users is just a simple simple
> spam filter that comes by default and moves the most annoying messages like
> "VIAGRA" "PENIS ENLARGEMENT" "COME TO NIGERIA" in a directory entitled
> "spam_is_here".

Simple spam filters do not work.  Seriously.  If they did then this _might_ be 
an acceptable solution.  Since even outlook has this, almost every spam is 
designed to avoid basic filtering.  If you don't have better spam filtering 
than this, you may as well not have spam filtering.

> Isn't that possible to have a default very basic Bayesian built-in

Yes, but.

a) Bayesian filters need training.  A training file _could_ be sent with kmail 
but it would add another 20MB to kmail's distribution size which would be 
unaccetable. crm114 is a bit of a winner here since it would only add 1MB.

b) Bayesian filters need to be constantly updated/corrected.  Static bayesian 
fitlers are no better than spamassassin.  And in a year they'll be useless.

c) That means writing and maintaining a bayesian filter in kmail's code, which 
seems like unnecessary work duplication to me.

I think that most people checking email get their spam filtered by their mail 
provider.  The ISP runs spamc, and modifies the headers.  This makes 
filtering in kmail really trivial.  In my case I run my own mail server for 
myself and family.  I still do not do filtering in kmail because then I'd 
have to configure it for everyone.  Instead of have it run from exim (via 
procmail).  Procmail is also used to filter the spam into everyone's spam 
boxes.  So the only feature I really needed for kmail was correcting 
mistakes. 

Getting back to your girlfriend example:

If the mail is spamc'ed by your ISP then you don't have a problem.  Kmail has 
been able to filter this for years.

If your ISP doesn't run spamc but you do have root on localhost, then install 
bogofilter or similar, and copy the procmail example from the bogofitler 
manpage.  E.g. I have the following procmail recipe for my brother-in-law.  
As far as he is concerned, spam detection happens automatically and nothing 
is needed in kmail:
	MAILDIR=$HOME/Maildir
	LOGFILE=$HOME/.procmail_log

	:0fw
	| /usr/bin/spamc

	:0:
	* ^X-Spam-Status: SPAM
	.spam/

	:0:
	./

If your ISP doesn't run spamc, and you don't have root, and no spam tools are 
installed, then you have some hassle.  But downloading and installing 
bogofilter is less than an hour's work, and spamassassin is similar.

So, my personal opinion is there are very few people who need to have spam 
detection built into kmail instead of as an add on.  .  Maintaining a good 
spam filter in kmail would be a lot of work, and the only people to benefit 
would be people without root who don't know how to install sofware, and 
people with root who don't know what software to install.  Most linux 
distributions install spamassassin as part of a 'mail server' service, so 
most people don't need to install anything.

It could be better documented.  It could be made much easier to set up 
(automatic creation of spam folder, autodetection and integration of 
installed spam tools).  It could be better integrated (right click to mark as 
spam, or mark as non spam). But I don't think a new filter is needed.

Corrin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/GzN0i5A0ZsG8x8cRAn2+AJ4gGmK6RdjDO+nWkjolKtewBTLbkwCglh/A
Wsjo4xon+AZbzpQCwN7FSes=
=V7+e
-----END PGP SIGNATURE-----

Comment 21 Peets 2003-08-01 09:07:31 UTC

Maybe this will help the debate: good spam filtering comes in two halves.   
 
Part 1, MTA (server) side.  That really ought to deal with the envelope.  Real time blacklisting, 
rejecting mail from non-FQDN hosts, immediate rejection etc etc.  The stuff that is identified 
as crap before you've even looked at content.  You could safely define this as not KMAIL's 
problem - and not under control of the end user. 
 
Part 2, client side. This is where you look INSIDE the message, at the content because (and 
this is crucial !) only the end user can really classify something as spam or not.  This is IMO 
where server based SpamAssassin falls short - you can only do a very generic cross cut of 
content filtering.  What if user A hates penis spam and sends everything to the 'spam' 
address, and user B is working on an anatomical dissertation? 
 
From KMail's point of view I fear that you *will* have to look at a degree of integration, but it'll 
be mainly in the 'incoming mail' stream and AFAIK (not looked enough so pardon me if I'm 
wrong) most of the filtering is post-arrival.  If there are hooks to stick something like 
SpamBayes (www.spambayes.org) or POPFile (at sourceforge) in it you will have solved the 
problem.  Actually, POPFile is IMO the smarter solution as that can be used for more than 
just 'spam spotting' - it's more flexible.  Having said that, I use SpamBayes on my corporate 
Outlook (bleagh, but bear with me) and that is end user proof in teh wya it has just a couple 
of toolbar buttons.  'Delete as spam' => learn that this message is spam too, and in the 
'possible spam' folder an extra button that says 'Recover from spam' => add to the whitelist 
analysis.  Even my sister can run that - and that's end user focus for you.  So maybe the 
default POPFile approach should be a bit more 'clicky simply buttony' based, with some 
background power to change it for power users. 
 
It would, of course, be cool if you could integrate with mass rejection systems at the same 
time (label as spam and bat a checksum over to a server which creates an envelope hitlist) 
but that's a bastard to get going well - there will always be the risk of false positives and it's 
hard as you're back in MTA land then.. 
 
I hope this helps matters a bit, or has at least partly unmuddied the waters ;-). 
 
Kind regards - and thanks for the software! 
 
/// P ///

Comment 22 Anders Lund 2003-08-01 10:32:56 UTC

Subject: Re:  Bayesian spam filter feature

On Friday 01 August 2003 09:07, Peets wrote:
> Maybe this will help the debate: good spam filtering comes in two halves.

Comment 23 automailer 2003-08-01 18:30:59 UTC

Regarding Peets point 1: 
A logical conclusion of this features needs to be online IMAP filters. I am in 
the lucky position to have an ISP that runs all my Mail through an Spamassasin 
as well as some other filters, however as long as a message is not infected 
with a known virus it will go to my Mailbox (which is a good thing by the 
way). Now with KMail I am condemed to at least look at the Mail since I can't 
filter for the "X-Spam-Status:" header (No automatic filters on IMAP folders). 
 
Of course this should'nt be done at the client side, however as long as the 
majority of all ISPs will not let you install your own Sieve filters it is the 
only way to do it. 
 
I know there is another bug report open for that feature, however in this 
regard these go hand in hand. 
 
Cheers, 
Thorsten

Comment 24 tnagy 2003-08-09 06:28:40 UTC

I agree with Corentin (#20), for practical reasons it would be better to improve
kmail integration with the existing spam filters than writing a new filter.

However, the filter settings should be done in a user-friendly way : perhaps a
wizard which detects the spam filters installed and ask the user to install one
(with a link to an up-to-date kmail help page in the kde help center) if no
suitable filter is found.

Comment 25 henrik 2003-08-24 09:59:23 UTC

Since spam filters (client-side) have only one purpose: To feed the e-mail 
client, it makes sense to me that the spam filtering be built in rather than 
having to configure an external program. External programs are fine for us 
powerusers and experimentalists, but neophyte users get the advantage only if 
it's in the program already - which also means there'll be no setting up of 
collecting mail at 127.0.0.1 with funny account names, which is a cause for much 
confusion.

I vote for integrated spam filtering, it's a Good Thing by itself. Too much 
modularity makes things too hard to use for the non-techie.

Comment 26 David P James 2003-09-28 07:46:38 UTC

From comment #20: 
 
> % apt-get install bogofilter  
>  
> Perhaps two minutes' work? 
 
That's just the beginning though. Next you have to sort spam from non-spam. 
Ok, so you've probably got a Trash can full of it and so maybe that isn't too 
bad. In my case I cheated and used Mozilla's Junk folder. Then you have to 
build the database(s). In the case of bogofilter, this isn't documented in the 
man page (seriously - it isn't. Sure, the command is listed but not the fact 
that you have to use '<' to specify the spam and non-spam dirs. I only found 
that out from the online bogofilter FAQ after spending 45 minutes reading and 
re-reading the man page). That done, it's back to KMail to create at least 2, 
probably 3 and maybe 4 filters to get all this to work. The first is to pipe 
each message through bogofilter/<spam-killer-of-choice>. The next is to check 
whether the first resulted in a positive spam identification. Ideally, you 
want to correct any mistakes the spam killer makes. The bogofilter+kmail 
mini-howto suggests hosing and rebuilding your database every day. Not smart 
imho. To get around this I created two manual-only filters that I run on false 
identifications to correct the database. 
 
Let's compare to Mozilla. Go to Tools | Junk Mail Controls. Activate JMC from 
the second tab. Run the JMC on all your folders (Inboxes, Trash cans). Correct 
any mistakes it makes by clicking on the Junk icon. When done, go back to 
Tools | Junk Mail Controls and enable JMC for all accounts, and check which 
options you want (whitelisting, moving and deleting). You're pretty well set 
to hit incoming spam, and correcting mistakes is fairly easy. 
 
If built in spam killing tools aren't going to be included by default, then 
here is my suggestion on how to make it easier to integrate: 
 
Under Settings, there should be another filter set - Junk/Spam Filters. 
It would have a few settings/look something like the following: 
 
 
[X] Enable Junk Mail Controls {unchecked by default, greys out rest} 
 
( ) Run spam filters before regular filters 
(*) Run spam filters after regular filters 
[X] Exclude messages from senders in address book from spam filters 
Pipe incoming messages through: _______ [Choose] 
 
Spam Criteria: 
[{Common Spam Headers} \|/] [Contains \|/] _______ 
Action: 
[Move to folder \|/] [Junk \|/] 
[Mark message as \|/] [Junk \|/] 
[Mark message as \|/] [Unread \|/] 
 
When manually marking a message as Junk: 
[Pipe through \|/] ____________ [Choose] 
[Move to folder \|/] [Junk \|/] 
[Mark message as \|/] [Read \|/] 
 
When unmarking a message marked as Junk: 
[Pipe through \|/] ____________ [Choose] 
[Mark message as \|/] [Unread \|/] 
[Move to folder \|/] [inbox \|/] 
[Process message through regular filters \|/] 
 
 
For all this to work, Kmail would also need an additional marking state - 
Junk, as well as a toolbar button for the same. I also think a Junk folder 
should be included by default, or at least created automagically when the 
enabling pref is checked. It would be nice to include a way to build the spam 
database from within Kmail, but I haven't thought it fully through yet. The 
spam side for something like bogofilter is easy enough - just marking all 
collected spam as Junk will suffice, much as in Mozilla at present. It's 
creating the ham side of the database that is more problematic. The best I can 
come up with is some sort of command that can be run on all messages in a 
folder that doesn't contain any junk. This assumes the user has already 
checked for junk in that folder and marked and removed them, as well as 
compacted the folder. Continuing from above, we might have something 
resembling: 
 
Non-junk database creation: 
 
The messages of the selected folder will be used to identify 'good' email. 
Before running, ensure that it is free of junk mail and compacted. 
Non-junk command: __________ [Choose] 
Folder: [inbox \|/] 
[ ] Run in terminal window 
[Run now] 
 
 
I'm not sure I like having the blurb, yet it seems necessary to avoid screw 
ups. Perhaps putting it into its own button-launched dialog is preferable. Or 
maybe something else...?

Comment 27 chris 2003-10-20 21:01:43 UTC

The suggested workaround -- filters and folders -- mentioned here isa poor substitute for true integrated Bayesian spam filtering.

Firstly, filtering will not work with IMAP (see bug 50997). That is, no filter will work on any incoming IMAP email. You have to manually invoke filters -- and if you have to do that you might as well hit "delete" on your spam.

Secondly, Bayesian spam filtering requires a great deal of interactivity. You need to constantly train the filter to recognize new spam and resurrect false positives. Mozilla's junk mail handling is the ideal model here: marking/unmarking spam is effortless. You don't have to set up learning jobs, mailbox processing scrips, move individual emails between spam/ham/unsure folders or whatnot: just check/uncheck the little "junk mail" dot next to each message. It really is a big deal if you have 150 unread emails in your inbox at the beginning of the day. Without an explicit, integrated front-end for Bayesian filtering engines, kmail will never come close to Mozilla's effortless spam handling.

Comment 28 Ingo Klöcker 2003-10-20 21:54:08 UTC

Subject: Re:  Bayesian spam filter feature

Just for your information:

a) Most likely in KMail 1.6 client side filtering of IMAP will be 
possible. At least we are working very hard to get it in.

b) With the new custom message actions marking/unmarking spam _is_ 
effortless.

c) KMail is independent of the spam filter program that is used. This is 
a very big advantage over mail clients with built-in spam filters 
because the user is free to use the best spam filter programs that are 
available. He can even use several spam filters with KMail.

Comment 29 Christo 2003-10-20 23:30:30 UTC

Subject: Re:  Bayesian spam filter feature

Yeah .. thats all great work .. will kmail 1.6 be part of the 3.2 release ?



Am Montag, 20. Oktober 2003 21:54 schrieben Sie:


-=> ------- You are receiving this mail because: -------
-=> You are a voter for the bug, or are watching someone who is.
-=>
-=> http://bugs.kde.org/show_bug.cgi?id=46826
-=>
-=>
-=>
-=>
-=> ------- Additional Comments From kloecker@kde.org  2003-10-20 21:54
 ------- -=> Subject: Re:  Bayesian spam filter feature
-=>
-=> Just for your information:
-=>
-=> a) Most likely in KMail 1.6 client side filtering of IMAP will be
-=> possible. At least we are working very hard to get it in.
-=>
-=> b) With the new custom message actions marking/unmarking spam _is_
-=> effortless.
-=>
-=> c) KMail is independent of the spam filter program that is used. This is
-=> a very big advantage over mail clients with built-in spam filters
-=> because the user is free to use the best spam filter programs that are
-=> available. He can even use several spam filters with KMail.
-=>

Comment 30 Konrad D?browski 2003-10-21 22:43:55 UTC

Subject: Re:  Bayesian spam filter feature

> c) KMail is independent of the spam filter program that is used. This is
> a very big advantage over mail clients with built-in spam filters
> because the user is free to use the best spam filter programs that are
> available. He can even use several spam filters with KMail.

Please take into account users that are new to computer when you do this. I am 
not one of these, but my father is. Things that seam more than obvious to 
normal users are not so to people like these. It has to be at least as easy 
to use as in Mozilla and some sort of filtering should be set up by default. 
Otherwise many people who would otherwise use this feature, will not.

Comment 31 Andreas Gungl 2003-10-22 18:35:58 UTC

To give you another update:

I'm working on a wizard to setup filter rules for (currently) spamassassin and bogofilter. Menu entries for "mark as spam" / "mark as not spam" will get created. Moving messages to a special folder will be supported.
More polish is certainly possible (toolbar buttons, spam icon etc.). But it definitely will let fathers and girlfriends set up these filters easily.

The bad news is, as we're in feature freeze for 3.2, you'll get this tool in a next version. In KDE 3.2 you'll still have to create the rules manually, but support by an FAQ item might be possible.

Comment 32 _ 2003-10-31 16:53:39 UTC

You're talking about the spam filters you "train", like the one in mozilla? Thats the best!
Especially if there is a feature "mark spam as read" in it ;-).

Comment 33 Christo 2003-10-31 17:49:23 UTC

i read alot of such projects .. but non totally included in kde ( and therefore configurable directly there ) ... lets hope on 3.2

Comment 34 _ 2003-10-31 18:05:13 UTC

Yes.
Maybe a suggestion: making a shared-spam-filter feature: the "trained" filter is uploaded to some server and can then be downloaded by other users, so the spam filter will filter lots of spam out itselfes.

Comment 35 Sander Devrieze 2003-10-31 18:33:11 UTC

Subject: Re:  Bayesian spam filter feature

Op vrijdag 31 oktober 2003 18:05, schreef Mark Janssen:
<snip>
> Maybe a suggestion: making a shared-spam-filter feature: the "trained"
> filter is uploaded to some server and can then be downloaded by other
> users, so the spam filter will filter lots of spam out itselfes.

That already exists: see for example razor and pyzor.

Comment 36 _ 2003-11-01 21:01:53 UTC

So KDE developers or anyone: is this going to be implentated?

Comment 37 Andreas Gungl 2003-11-01 21:38:21 UTC

Subject: Re:  Bayesian spam filter feature

On Saturday 01 November 2003 21:01, Mark Janssen wrote:
> So KDE developers or anyone: is this going to be implentated?

Did you read the latest comments of the KMail developers to this item? The 
answers are (see details below):
1) You can use anti spam tools together with KMail 1.6 in a way as e.g. 
Mozilla works (mark as spam / ham, filter messages, detect and move spam 
messages).
2) In KDE 3.2 you will most likely have to manually  setup the filter rules 
by yourself. Work is on the way to improve this situation and that wizard 
will most likely be included in an intermediate kdepim release.
3) KMail will not include it's own spam filter implementation but will rely 
on existing implementations like Spam Assassin or Bogofilter.


 
 ------- Additional Comment #28 From Ingo Klcker 2003-10-20 21:54 ------- 
Subject: Re: Bayesian spam filter feature 
 
 Just for your information: 
 
 a) Most likely in KMail 1.6 client side filtering of IMAP will be 
 possible. At least we are working very hard to get it in. 
 
 b) With the new custom message actions marking/unmarking spam _is_ 
 effortless. 
 
 c) KMail is independent of the spam filter program that is used. This is 
 a very big advantage over mail clients with built-in spam filters 
 because the user is free to use the best spam filter programs that are 
 available. He can even use several spam filters with KMail. 
 
 ------- Additional Comment #31 From Andreas Gungl 2003-10-22 18:35 ------- 
To give you another update: 
 
 I'm working on a wizard to setup filter rules for (currently) spamassassin 
and bogofilter. Menu entries for "mark as spam" / "mark as not spam" will 
get created. Moving messages to a special folder will be supported. 
 More polish is certainly possible (toolbar buttons, spam icon etc.). But it 
definitely will let fathers and girlfriends set up these filters easily. 
 
 The bad news is, as we're in feature freeze for 3.2, you'll get this tool 
in a next version. In KDE 3.2 you'll still have to create the rules 
manually, but support by an FAQ item might be possible.

Comment 38 _ 2003-11-01 22:45:36 UTC

That's too bad...could you give me an approx. date?

Comment 39 Thiago Macieira 2003-11-02 02:54:35 UTC

When KDE 3.3 or 4.0 is released. As said, KMail 1.6 (to be released with KDE 3.2) will allow you to do what you want, but you will get no wizard to do so for you.

Comment 40 _ 2003-11-02 12:22:18 UTC

OK cool.

Comment 41 phobos 2003-12-09 00:08:13 UTC

Er, just a small extra request: after all that training on the client side I found 90% accuracy good enough for me (if it's really really essential there are other ways to get hold of me ;-), but there's no DELETE filter function.  I don't really want to go and shell an 'rm' of sorts on the trash directory, but sometimes I have to leave a box running just to keep up with the crap (yes - working on getting the ISP front end to get off it's backside too).  Having thus a trash directory fill up kinda gets irritating..  Is a hard delete function possible?

Comment 42 Gioele Barabucci 2004-02-09 20:43:44 UTC

*** Bug 34741 has been marked as a duplicate of this bug. ***

Comment 43 Andreas Gungl 2004-03-14 21:40:05 UTC

Hi,
whoever contributed to this report and is using KDE CVS or CVS snapshots, I would like to get to know your filter rules for your favourite anti-spam tool. The infrastructure for a wizard to help new users setting up the filter rules is available. What is needed are templates to detect the tools and build the correct rules.
So far, spamassassin and bogofilters are considered. It would be fine to widen the range of supported tools until the KDE PIM 3.3 release.

Comment 44 Patrick Audley 2004-03-14 23:25:15 UTC

I'll be setting up DSPAM with it tomorrow - if it works well I send my rules and how to detect it.

Comment 45 Aaron Williams 2004-03-27 01:54:56 UTC

I just got dspam working on my mail server and must say it's much better than Spam Assassin.  It's much faster and much more accurate from my experience.  I was getting false positives with SA and it was only catching 20-30% of my spam, and that's with Razor running as well.  After only a few days DSPAM is catching over 90% of the spam.  It's working better in some cases than Mozilla.

I think DSpam would be ideal with KMail since it is also available as a shared library.

Comment 46 lakeland 2004-03-27 02:31:15 UTC

On Sat, 27 Mar 2004 12:54, aaronw@net.com wrote:
> I just got dspam working on my mail server and must say it's much better
> than Spam Assassin.  It's much faster and muc

And probably bogofilter does better, and probably spambayes does better than 
bogofilter, and probably crm114 does better than spambayes, and say <program 
not invented yet> does better still.  That's why kmail uses an external 
filter rather than a built in spam checker.  The idea is that kmail's druid 
can guide you through setting up a spam filter.  SpamAssassin is currently 
the easiest to tie into.  

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAZNhhi5A0ZsG8x8cRAqr1AKCR8zTmEuvREXAu34L3SDf/KgH1dwCcD1hf
MX9wVkjdLAPVGY5LxdGBkyY=
=95AP
-----END PGP SIGNATURE-----

Comment 47 Philipp Lehman 2004-04-07 17:22:44 UTC

To add another implementation of a bayesian filter to the mix, consider:

http://www.nuclearelephant.com/projects/dspam/

Very accurate and comes with a library (libdspam) kmail could link to.

As mentioned serveral times in the discussion, yes, you can use pretty much any backend app via filters but I don't think that's the way bayesian are meant to be used. I'd rather drag&drop messages in and out of a 'Spam' folder and have kmail reclassify them automatically. You just can't do that with filters.

Comment 48 Datschge 2004-04-07 17:39:39 UTC

http://docs.kde.org/en/HEAD/kdepim/kmail/the-anti-spam-wizard.html

"If you have checked the 'Classify messages manually as spam / not 
spam' option the wizard will create toolbar buttons for marking 
messages as spam or as ham; keep in mind that classifying messages as 
spam will also move those messages to the folder you had specified 
for spam messages."

Isn't that what you want in your case?

Comment 49 Philipp Lehman 2004-04-07 19:58:01 UTC

Am Mittwoch, 7. April 2004 17:41 schrieb Datschge:

> http://docs.kde.org/en/HEAD/kdepim/kmail/the-anti-spam-wizard.html

This is CVS only at this time, right? So I can only judge it by the 
description.

> Isn't that what you want in your case?

Sure sounds interesting, but there's still the interface problem I 
mentioned: moving messages by drag&drop doesn't trigger any filters. 
Actually, that might be a useful feature...

OTOH, it might be more productive in the long run to think about spam 
as a general mail classification issue and approach it like that. 
Suppose you had a bayesian message classifier linked into kmail. When 
setting up filters, you wouldn't specify some regex that some header 
or whatever needs to match. You'd pick a category instead and the 
category would be defined by the statistical properties of all 
messages in a (physical or virtual) mail folder. This way, your 
filter adjusts dynamically to changes in your incoming email and in 
the way you sort it into folders.

That's why I disagree with comment #46. I don't think a bayesian 
filter is yet another spam filter. It's a new concept that could do 
more than just tag spam. But to really work seamlessly, it needs 
proper support inside kmail.

Comment 50 Andreas Gungl 2004-04-07 22:01:40 UTC

On Mittwoch, 7. April 2004 19:58, Philipp Lehman wrote:
> Am Mittwoch, 7. April 2004 17:41 schrieb Datschge:
> > http://docs.kde.org/en/HEAD/kdepim/kmail/the-anti-spam-wizard.html
>
> This is CVS only at this time, right? So I can only judge it by the
> description.

Well, the wizard is in CVS only. But all functionality what is needed for 
the filtering etc. is already in KMail 1.6. The wizard only simplifies the 
setup, it doesn't add functionality.

> Sure sounds interesting, but there's still the interface problem I
> mentioned: moving messages by drag&drop doesn't trigger any filters.
> Actually, that might be a useful feature...

Drag & Drop is not supported, that's right. But nobody said that you can't 
run e.g. sa-learn over your spam folder from time to time.

> That's why I disagree with comment #46. I don't think a bayesian
> filter is yet another spam filter. It's a new concept that could do
> more than just tag spam. But to really work seamlessly, it needs
> proper support inside kmail.

As you can read in several comments, it's currently not intended to include 
any tool into KMail. But nobody knows what will happen when someone is 
coming along with a patch. ;-)

BTW, I had asked for help to complete the configuration information for the 
wizard. Two tools are already supported. Would be nice if anybody could 
contribute to extend the config file for using more tools (see 
http://webcvs.kde.org/cgi-bin/cvsweb.cgi/kdepim/kmail/kmail.antispamrc?sortby=date 
for details).

Comment 51 Patrick Audley 2004-04-11 19:34:52 UTC

Ah, I can see a potentially nice solution.  How about having filters that get executed when mail enters a folder?  That way we could have the following:

   - spam arrives in inbox
   - user gnashes teeth and drags it to the spam folder
   - the per-folder filter runs and executes sa-learn or bogofilter or whatever
 
Since the properties dialog box is already large, we could instead add "Added to Folder" in the "headers" drop down in the filter screen.  It might make a nice generic mechansism for other things too..

Comment 52 Philipp Lehman 2004-04-16 21:03:38 UTC

Am Sonntag, 11. April 2004 19:35 schrieb Patrick Audley:

> Ah, I can see a potentially nice solution.  How about
> having filters that get executed when mail enters a folder?

> Since the properties dialog box is already large, we could instead
> add "Added to Folder" in the "headers" drop down in the filter
> screen.  It might make a nice generic mechansism for other things
> too..

Yes, I agree. That would be a sensible new feature.

Comment 53 Fred Emmott 2004-05-23 16:56:14 UTC

This feature exists in KMail CVS.

Comment 54 lakeland 2004-05-24 01:58:30 UTC

On Mon, 24 May 2004 02:56, Fred Emmott wrote:

>This feature exists in KMail CVS.

Agreed.  I think the bug should be closed.  Perhaps a more ambitious one can 
be opened describing wizard integration or whatever.

Corrin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAsTp5i5A0ZsG8x8cRAv7SAJ43RIrLfwRawPycTBM3sQ9yVrfT8wCcDcB0
5EJsweAj4GXnjczh2mkNLNY=
=XW/b
-----END PGP SIGNATURE-----

Comment 55 Andreas Gungl 2004-05-30 21:08:18 UTC

Closing as requested.
Thanks everybody who contributed to this report. Please try the upcoming KDE PIM 3.3, a general spam filter support for KMail is included in that release. If you think there can be done more, you might want to try another wishlist report.