Version: (using KDE Devel) Installed from: Compiled sources The anti spam wizard gets the bogofilter rules completely wrong for recent versions of bogofilter (they changed the commandline options, breaking everything, a while back). For the classify rule, it says "bogofilter -p -e" but "bogofilter -p -e -u" is better. For the mark as not spam rule, it says "bogofilter -N" which unmarks the message as non-spam. It should say "bogofilter -S -n" which unmarks the message as spam and marks it as non-spam. Similarly, "bogofilter -S" should be "bogofilter -N -s". It really also needs two new rules for marking messages received prior to setting up the filter, which would do "bogofilter -s" (mark unmarked message as spam) and "bogofilter -n" (mark unmarked message as non-spam). The commandline options changed as of version 0.11.0, released over 11 months ago. This can be detected via the output of bogofilter -V.
Subject: Re: New: anti spam wizard gets bogofilter rules completely wrong for newer bogofilters Sorry, I don't understand your problem and I also can't see any parameter changes when I compare a 0.10.x bogofilter against the current manpage at http://bogofilter.sourceforge.net (incl. change log). So, instead of telling us to change A to B please explain in detail why you think the changes are necessary and what is the intention behind that changes and the benefit compared to the old configuration. On Sonntag, 8. Februar 2004 16:16, Richard Smith wrote: > The anti spam wizard gets the bogofilter rules completely wrong for > recent versions of bogofilter (they changed the commandline options, > breaking everything, a while back). > > For the classify rule, it says "bogofilter -p -e" but "bogofilter -p -e > -u" is better. > > For the mark as not spam rule, it says "bogofilter -N" which unmarks the > message as non-spam. It should say "bogofilter -S -n" which unmarks the > message as spam and marks it as non-spam. > > Similarly, "bogofilter -S" should be "bogofilter -N -s". > > It really also needs two new rules for marking messages received prior to > setting up the filter, which would do "bogofilter -s" (mark unmarked > message as spam) and "bogofilter -n" (mark unmarked message as non-spam). > > The commandline options changed as of version 0.11.0, released over 11 > months ago. This can be detected via the output of bogofilter -V.
Subject: Re: anti spam wizard gets bogofilter rules completely wrong for newer bogofilters On Sunday 08 February 2004 7:01 pm, Andreas Gungl wrote: > Sorry, I don't understand your problem and I also can't see any parameter > changes when I compare a 0.10.x bogofilter against the current manpage at > http://bogofilter.sourceforge.net (incl. change log). From CHANGES-0.11: 2003-02-27 * Separate message registration options from unregistration options. '-s' and '-n' register messages and '-S' and '-N' unregister them. '-S' and '-n' may be used together, as can '-N' and '-s'. From the NEWS file, in the section for revision 0.11.0: * Separated message registration options from unregistration options. '-S' and '-N' have been changed and now just do unregistration. To move a message from one wordlist to the other, use '-S -n' or '-N -s' (as appropriate) > So, instead of telling us to change A to B please explain in detail why you > think the changes are necessary and what is the intention behind that > changes and the benefit compared to the old configuration. I wasn't telling you to change anything, merely telling you your existing configuration is broken, and one way to fix it. From the fact it's broken, the intention behind my changes and the benefit compared to the old configuration should be obvious. Here's a reiteration of what I said before: Your 'mark as spam' option is "bogofilter -S". According to the bogofilter manpage (the one at the website you quoted): The -S option tells bogofilter to undo a prior registration of the same message as spam. If a message was incorrectly entered in the spam wordfile by '-n' or '-u' and you want to remove it from the spam wordfile and enter it in the non-spam wordfile, use options '-Sn'. If '-S' is used for a message that wasn't registered as spam, the counts will still be decremented.[1] IOW, -S marks a previously spam-marked message as unknown. Similarly, -N (your mark as non-spam option) marks a previously not-spam-marked message as unknown. Since these only decrease the hit counts for words, using them will never build up any knowledge in bogofilter at all (in fact, what they do is to unteach it the wrong thing). I hope you can now see why I called them 'completely wrong'. Now, regarding my proposed solution: Adding the -u option to the options for the classify command causes bogofilter to automatically add the messages it filters into the category it decides they're in (spam or non-spam), so you only have to manually teach it if it makes mistakes. Now, suppose you have a misclassified message. If bogofilter said it's spam, and it's not, the correct commandline is "bogofilter -S -n" (taken from the manpage section above); this is the "mark as non-spam" option I suggested. If bogofilter said it's not spam, and it is, the correct commandline is "bogofilter -N -s". If, on the other hand, you have messages you want to teach bogofilter with that it hasn't classified, you need to call it with different arguments: If a message is unclassified and it's spam, you want "bogofilter -s". If a message is unclassified and it's not spam, you want "bogofilter -n". There may be some clever way to have just a single mark-as-{not-,}spam action which works whether or not bogofilter's already classified a mail, but I can't think of a way to do that using KMail's filters alone. Anyway, I hope this answers your questions. [1] There's actually a typo in this item; where it says "by -n or -u" it means "by -s or -u", as is readily apparent from reading what -s and -n do.
On Monday 09 February 2004 01:22, Richard Smith wrote: > [...] Richard, I've read the manpage again and now I understand the differences. Perhaps I didn't find the differences because English is not my native language and I was too fast reading over the pages. > Here's a reiteration of what I said before: > > Your 'mark as spam' option is "bogofilter -S". According to the > bogofilter manpage (the one at the website you quoted): > > The -S option tells bogofilter to undo a prior registration of the > same message as spam. If a message was incorrectly entered in the spam > wordfile by '-n' or '-u' and you want to remove it from the spam wordfile > and enter it in the non-spam wordfile, use options '-Sn'. If '-S' is used > for a message that wasn't registered as spam, the counts will still be > decremented.[1] > > IOW, -S marks a previously spam-marked message as unknown. > Similarly, -N (your mark as non-spam option) marks a previously > not-spam-marked message as unknown. Since these only decrease the hit > counts for words, using them will never build up any knowledge in > bogofilter at all (in fact, what they do is to unteach it the wrong > thing). I hope you can now see why I called them 'completely wrong'. You're right. I think that this parameter change is very unfortunate. It's not enough for the wizard to detect the programs but as it seems we have to care for the proper version too. One could argue that the old version is already history, but e.g the stable SuSE 8.2 distribution ships such an old version. I guess, there are a lot of people using not up-to-date Bogofilter versions. > Now, regarding my proposed solution: > > Adding the -u option to the options for the classify command causes > bogofilter to automatically add the messages it filters into the category > it decides they're in (spam or non-spam), so you only have to manually > teach it if it makes mistakes. > > Now, suppose you have a misclassified message. > If bogofilter said it's spam, and it's not, the correct commandline is > "bogofilter -S -n" (taken from the manpage section above); this is the > "mark as non-spam" option I suggested. > If bogofilter said it's not spam, and it is, the correct commandline is > "bogofilter -N -s". > > If, on the other hand, you have messages you want to teach bogofilter > with that it hasn't classified, you need to call it with different > arguments: If a message is unclassified and it's spam, you want > "bogofilter -s". If a message is unclassified and it's not spam, you want > "bogofilter -n". > > There may be some clever way to have just a single mark-as-{not-,}spam > action which works whether or not bogofilter's already classified a mail, > but I can't think of a way to do that using KMail's filters alone. > > Anyway, I hope this answers your questions. As you've stated yourself it's a problem to know for sure if a message was classified as spam or ham. You can't be really sure (in KMail too) even if you use the status icons for spam / ham and the -u option. This makes the filtering pretty difficult compared to e.g. SpamAssassin. I personally don't like to mark non-spam messages by an icon. More than 90% would have it making that information nearly useless. One argument to do it could be that I can see if a message has been classified. But you can't be sure again, because the flag could be set manually too. My approach was to use Bogofilter for classification based on a reliable training (by using the classification actions in KMail or any external process). That's why I didn't use -u. The classification in KMail would have been made explicitely by the user and keeps the statistics clean, but I do realize that the counting isn't perfect in this case too. I tend to agree with you about being "-p -e -u" together with "-S -n" and "-N -s" the best possible solution which we can achieve. I'm going to find a way to differentiate versions of a given tool to be able to handle such changes in the meaning of the parameters.
I've modified the configuration file for the wizard. Now the new options are configured. I've verified that they don't force an error with at least bogofilter 0.10.3.1 although the counting might be affected. Dealing with different versions is pretty difficult as we already know from the pgp support in KMail. So I take this parameter change as an exception for now. If we encounter more of such problems, we still can implement an appropriate mechanism.
I've just started using kmail for the first time in years, having migrated from evolution. I tried setting up the kmail spam filters and ran across a similar problem relating to the updates. As Richard pointed out above, -N -s and -S -n only work for spam that has already been classified. This meant that when I built up my spam database using the default kde filters of -Ns and -Sn, I wasn't actually building anything worthwhile. I first needed to classify my emails using -s or -n alone and THEN change it with -Ns or -Sn. This is what Richard mentioned above. In other words, kmail needs to handle two different filters: 1. Marking as spam or ham of emails that have not already been marked - using -n and -s 2. Marking/correcting spam or ham emails that have already been marked - using -Ns and -Sn Even if this is not possible, it will cause a great deal of confusion to people trying to set up spam filtering in kmail for the first time, and there should at least be a warning or something. Hope that helps.
Additional note, I should mention more clearly that the version of kmail I'm using set the rules as -Ns and -Sn, so not the same as what Richard had.
SVN commit 695738 by tmcguire: Change the filter commands for bogofilter. The old behavior corrupted the bogofilter database because KMail unregistered messages which were not registered with bogofilter in the first place. With the new behavior, messages which are classified automatically are no longer added to the bogofilter database. For more details and a better explaination, see the bugreport and especially the bogofilter mail archives (linked to from the bugreport). BUG: 148211 CCBUG: 74577 M +3 -3 kmail.antispamrc --- trunk/KDE/kdepim/kmail/kmail.antispamrc #695737:695738 @@ -34,10 +34,10 @@ Executable=bogofilter -V URL=http://bogofilter.sourceforge.net PipeFilterName=Bogofilter Check -PipeCmdDetect=bogofilter -p -e -u +PipeCmdDetect=bogofilter -p -e PipeCmdNoSpam= -ExecCmdSpam=bogofilter -N -s -ExecCmdHam=bogofilter -S -n +ExecCmdSpam=bogofilter -s +ExecCmdHam=bogofilter -n DetectionHeader=X-Bogosity DetectionPattern=(yes)|(spam\\b) DetectionPattern2=