Bug 93040 - Bogofilter keywords have changed
Summary: Bogofilter keywords have changed
Status: RESOLVED FIXED
Alias: None
Product: kmail
Classification: Applications
Component: filtering (show other bugs)
Version: 1.7
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-11-10 18:14 UTC by Degand Nicolas
Modified: 2007-09-14 12:17 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Degand Nicolas 2004-11-10 18:14:48 UTC
Version:           1.7 (using KDE 3.3.1,  (3.1))
Compiler:          gcc version 3.3.5 (Debian 1:3.3.5-2)
OS:                Linux (i686) release 2.6.9-1-k7

Since version 0.93 of bogofilter, filtering rules have changed. The KMail + Bogofilter wizard needs to be updated. Here is an extract of Bogofilter documentation

"Bogofilter's defaults have been changed.  It now operates in tri-state
mode and will classify messages as Spam, Ham, or Unsure.

If you're checking messages for "X-Bogosity: Yes" or "X-Bogosity: No",
you _need_ to change your checks.  Use "X-Bogosity: Spam" and
"X-Bogosity: Ham" instead of the old forms.  Also, checking for
"X-Bogosity: Unsure" and putting those messages in a separate folder
(or mailbox) will give you an excellent set of messages for training,
as "Unsure" messages are messages that bogofilter has too little
information to classify (with certainty) as spam or ham."
Comment 1 Andreas Gungl 2004-11-21 20:08:49 UTC
CVS commit by gungl: 

Make the generated filters aware of changes between bogofilter 
versions 0.92 and 0.93 (new keywords in X-Bogosity)

BUG: 93040


  M +3 -3      kmail.antispamrc   1.12


--- kdepim/kmail/kmail.antispamrc  #1.11:1.12
@@ -20,5 +20,5 @@
 [Spamtool #2]
 Ident=bogofilter
-Version=1
+Version=2
 VisibleName=&Bogofilter
 Executable=bogofilter -V
@@ -29,7 +29,7 @@
 ExecCmdHam=bogofilter -S -n
 DetectionHeader=X-Bogosity
-DetectionPattern=yes
+DetectionPattern=(yes)|(spam)
 DetectionOnly=0
-UseRegExp=0
+UseRegExp=1
 SupportsBayes=1
 


Comment 2 David P James 2004-11-27 18:28:26 UTC
Andreas:

Correct me if I'm wrong but your change would appear to make things worse, not better. Take a look at the standard headers produced by Bogofilter:

X-Bogosity: Spam, tests=bogofilter, spamicity=0.999888, version=0.93.1
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.496371, version=0.93.1
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=0.93.1

Would not your change
> +DetectionPattern=(yes)|(spam)
> +UseRegExp=1
move all the above into the junk folder since the expression 'spam' appears in all 3 in the word 'spamicity'? The messages are still marked correctly but are moved to the wrong place by KMail - that was my experience when I made those changes to my existing filters. Instead, you should probably use:

\byes\b|\bspam\b 
Comment 3 Andreas Gungl 2004-11-27 21:47:33 UTC
On Samstag 27 November 2004 18:28, David P James wrote:
> Andreas:
>
> Correct me if I'm wrong but your change would appear to make things
> worse, not better. Take a look at the standard headers produced by
> Bogofilter:
>
> X-Bogosity: Spam, tests=bogofilter, spamicity=0.999888, version=0.93.1
> X-Bogosity: Unsure, tests=bogofilter, spamicity=0.496371, version=0.93.1
> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=0.93.1
>
> Would not your change
>
> > +DetectionPattern=(yes)|(spam)
> > +UseRegExp=1
>
> move all the above into the junk folder since the expression 'spam'
> appears in all 3 in the word 'spamicity'? The messages are still marked
> correctly but are moved to the wrong place by KMail - that was my
> experience when I made those changes to my existing filters. Instead, you
> should probably use:
>
> \byes\b|\bspam\b

David,

thanks for pointing me to this. I still have Bogofilter 0.92 installed, so I 
have tested the regexp only by faked headers.
However I wonder if using "spam" as criterion is a good choice as users are 
forced to use a regexp based search even if they use the newer version 
only. "yes" and "no" were much simpler to handle.

I've changed it to (yes)|(\bspam\b) in CVS which is easier to read IMO and 
should work for both versions.

Regards,
Andreas

Comment 4 David P James 2004-11-27 22:40:30 UTC
On Sat 27 November 2004 15:47, Andreas Gungl wrote:
>
> David,
>
> thanks for pointing me to this. I still have Bogofilter 0.92
> installed, so I have tested the regexp only by faked headers.
> However I wonder if using "spam" as criterion is a good choice as
> users are forced to use a regexp based search even if they use the
> newer version only. "yes" and "no" were much simpler to handle.

I would tend to agree, but then I had nothing to do with the change 
either (though X-Bogosity: Spam is a little clearer to understand when 
read than X-Bogosity: Yes). Either that or the 'spamicity' part of the 
header should have been changed to something like 'bogosity' to avoid 
this problem. I've just filed a bug on Bogofilter requesting such a 
change:
https://sourceforge.net/tracker/index.php?func=detail&aid=1074330&group_id=62265&atid=499997

(although that wouldn't help us much since it would still catch older 
instances of Bogofilter...)

>
> I've changed it to (yes)|(\bspam\b) in CVS which is easier to read
> IMO and should work for both versions.
>

Good idea

Comment 5 Degand Nicolas 2004-11-28 11:44:12 UTC
An upgrade script of the filters already created by Kmail would be nice.
Comment 6 Andreas Gungl 2004-11-28 14:15:23 UTC
On Sunday 28 November 2004 11:44, Degand Nicolas wrote:
> An upgrade script of the filters already created by Kmail
> would be nice.

You should consider that detecting those filters is all but easy. It's 
getting even harder if the user has fine tuned the filters which were 
initially created by the wizard. The best workaround is to delete the 
existing filters and recreate a new set via the wizard.

If somebody wants to work on that task, I can give a hand to understand how 
the current implementation works.

Comment 7 Degand Nicolas 2005-01-25 15:37:18 UTC
A script could be provided for the people who uses the scripts generated by the wizard without tuning them.