Bug 148211 - Bogofilter commands problematic
Summary: Bogofilter commands problematic
Status: RESOLVED FIXED
Alias: None
Product: kmail
Classification: Applications
Component: filtering (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-25 22:53 UTC by Ingomar Wesp
Modified: 2007-08-02 22:34 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ingomar Wesp 2007-07-25 22:53:07 UTC
Version:            (using KDE KDE 3.5.7)
Installed from:    Ubuntu Packages
OS:                Linux

The filter setup created by KMail's "Anti-Spam Wizard" for bogofilter might 
lead to severe problems with bogofilter's wordlist when the user applies 
either "Classify as SPAM" or "Classify as NOT SPAM" on messages that have not 
been (automatically) registered before.

Since bogofilter's auto-register option "-u" only registers messages that it 
can automatically classify as spam or ham, using "bogofilter -N -s" for 
manually registering messages as spam and "bogofilter -n -S" for manually 
registering messages as HAM leads to a decrement of all spam- or ham-counts 
for all tokens contained in the processed message as well as a decrement of 
the spam- or ham-counts in the special token ".MSG_COUNT". If used on messages that have not been registered before, this may lead to a condition where the spam (or ham-) count of tokens exceed the spam (or ham-) message count, which in turn will produce odd results in the individual spam- or ham-propabilities for the 
affected tokens. In extreme cases (spam-value for ".MSG_COUNT" is 0), 
bogofilter will produce a spam probability of "nan" because of a floating 
point division by zero.

Since it is generally not a very good idea to unregister messages that have 
not been registered before, I would suggest to change the generated filters 
into a setup that refrains from auto-registration and manual 
de-registrations.

As suggested by Matthias Andree on the bogofilter mailing list, I would like 
to propose replacing the current filter setup …

+----------------------+----------------------------------------+-------+
| Filter name          | Action                                 | Auto? |
+----------------------+----------------------------------------+-------+
| Bogofilter Check     | Pipe through    "bogofilter -p -e -u"  | Yes   |
| Classify as SPAM     | Execute command "bogofilter -N -s"     | No    |
| Classify as NOT SPAM | Execute command "bogofilter -S -n"     | No    |
+----------------------+----------------------------------------+-------+

… with something like this:

+----------------------+----------------------------------------+-------+
| Filter name          | Action                                 | Auto? |
+----------------------+----------------------------------------+-------+
| Bogofilter Check     | Pipe through    "bogofilter -p -e "    | Yes   |
| Classify as SPAM     | Execute command "bogofilter -s"        | No    |
| Classify as NOT SPAM | Execute command "bogofilter -n"        | No    |
+----------------------+----------------------------------------+-------+

Although SPAM and HAM messages that are correctly classified by bogofilter
are not automatically added to the wordlist, this filter setup works pretty 
well on my system, relying only on the occasional manual classifications.

Not only does it avoid the problems mentioned above, but it also results in a 
massive performance increase when checking messages, since no write access to 
the wordlist is required.
Comment 1 Thomas McGuire 2007-07-29 02:51:39 UTC
>As suggested by Matthias Andree on the bogofilter mailing list
Can you add a link to the archives please?

Other than that, this sounds sensible and can be easily achieved by modifying the kmail.antispamrc file. Maybe I'll have a look at this later.
Comment 2 Ingomar Wesp 2007-07-30 17:16:07 UTC
>> As suggested by Matthias Andree on the bogofilter mailing list 
> Can you add a link to the archives please? 

Sure. The message I was referring to can be found here:
<http://www.bogofilter.org/pipermail/bogofilter/2007-July/009252.html>

> Other than that, this sounds sensible and can be easily achieved by modifying
> the kmail.antispamrc file. Maybe I'll have a look at this later. 

Yep. Changing "PipeCmdDetect", "ExecCmdSpam" and "ExecCmdHam" should do.
Comment 3 Thomas McGuire 2007-08-02 22:34:39 UTC
SVN commit 695738 by tmcguire:

Change the filter commands for bogofilter.
The old behavior corrupted the bogofilter database because KMail unregistered
messages which were not registered with bogofilter in the first place.

With the new behavior, messages which are classified automatically are no
longer added to the bogofilter database.

For more details and a better explaination, see the bugreport and especially
the bogofilter mail archives (linked to from the bugreport).

BUG: 148211
CCBUG: 74577


 M  +3 -3      kmail.antispamrc  


--- trunk/KDE/kdepim/kmail/kmail.antispamrc #695737:695738
@@ -34,10 +34,10 @@
 Executable=bogofilter -V
 URL=http://bogofilter.sourceforge.net
 PipeFilterName=Bogofilter Check
-PipeCmdDetect=bogofilter -p -e -u
+PipeCmdDetect=bogofilter -p -e
 PipeCmdNoSpam=
-ExecCmdSpam=bogofilter -N -s
-ExecCmdHam=bogofilter -S -n
+ExecCmdSpam=bogofilter -s
+ExecCmdHam=bogofilter -n
 DetectionHeader=X-Bogosity
 DetectionPattern=(yes)|(spam\\b)
 DetectionPattern2=