Bug 136261

Summary: crm114 support in KMail antispam wizard
Product: [Unmaintained] kmail Reporter: Martin Steigerwald <Martin>
Component: filteringAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED WORKSFORME    
Severity: wishlist CC: heri+kde, luigi.toscano, peter.ruskin
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: /etc/kde3/kmail.antispamrc with CRM114 support
patch for kmail.antispamrc to the version from KMail 3.5.5
updated patch for kmail.antispamrc

Description Martin Steigerwald 2006-10-24 22:02:50 UTC
Version:            (using KDE KDE 3.5.5)
Installed from:    Debian testing/unstable Packages
OS:                Linux

Hello,

I set up KMail to work with crm114 (http://crm114.sourceforge.net). crm114 is awesome at discriminating between mails I want and mails I don't want. It way faster than bogofilter and highly accurate after just two days of training.

I set up some filters manually first, but now also started on an new entry for kmail.antispamrc and I thought its about time to give something back to the great KDE project ;-). 

Its basically working but there are some rough edges I need help with. I attach my complete kmail.antispamrc to this bug. I paste the relevant entry here for writing down the issues I found. So here it is:

[Spamtool #11]
Ident=crm114
Version=1
Priority=20
VisibleName=CRM114
Executeable=/usr/bin/crm
PipeFilterName=CRM114 Check
PipeCmdDetect=crm -u $HOME/.crm114 mailreaver.crm

This directory in the home directory can basically be anywhere, but I think ~/.crm114 is a reasonable default value. crm is a language. The command just tells crm to load the mailreaver.crm script file and switch to $HOME/.crm111 before.

ExecCmdSpam=crm -u $HOME/.crm114 mailreaver.crm --spam
ExecCmdHam=crm -u $HOME/.crm114 mailreaver.crm --good

These are similar. But unfortunately KMail doesn't set those commands for the "Classify as Spam" and "Classify as non Spam" filter. Each filter just makes the mail as spam or ham but does not execute the command I gave above. (I have working filters anyway, the manually created ones;-)

URL=http://crm114.sourceforge.net
DetectionHeader=X-CRM114-Status
DetectionPattern=SPAM
DetectionPattern2=UNSURE

crm114 supports and needs unsure. Cause ideally you only train it errors and mails where it asks you too.

DetectionOnly=0

I am not sure about this one. But I guess when it is not only able to detect, but also learn spam this should be set to zero. Unfortunately I do not find the (incomplete) documentation I read a few hours ago anymore.

UseRegExp=0
ScoreName=CRM114
SupportsUnsure=1

crm114 highly depends on this. I am a bit unsure on how to best handle. I may have missed quite some unsure mails, but now I do it like the anti spam assistant suggests, I filter it do a certain folder. But then I would like KMail to refilter any mail I select as Ham so that I do not have to manually filter them to their proper destinations. Well I can do it manually for now and this would probably be another bug report.

ScoreHeader=X-CRM114-Status

crm114 places the score in the same header as where it places the spam status. It looks like this for good mails:

X-CRM114-Status: GOOD (  11.51  )

As this for spam mails:

X-CRM114-Status: SPAM  ( -61.96  )

And as this for mails that CRM wants to learn:

X-CRM114-Status: UNSURE (  -8.61  )


ScoreType=Decimal
ScoreValueRegexp=([\d\.-]+)\s*

I filled a regexp in there that should catch the above scores, but I have the feeling it is not yet working as expected. How do I tell KMail what score is for a spam, what score is for unsure, and what score is for a ham, so that KMail can choose the right coloring? I do not yet know how crm114 scores mails anyway.

I am also unsure about "SupportsBayes". The crm114 docs seem to indicate something that it does bayes, but not full bayes, as a quick grep seems to suggest.

In the end a few hints on how to test crm114 with KMail using my kmail.antispamrc entry (for Debian only, adapt as necessary):

1) aptitude install crm114 (preferably 20060704a-3 or later as it fixes a bug in mailfilter.cf)

2) mkdir ~/.crm114 ; cd ~/.crm114

3) cp -a /usr/share/crm114/* .

4) cssutil -rb spam.css ; cssutil -rb nonspam.css

5) touch blacklist.mfp  priolist.mfp  rewrites.mfp  whitelist.mfp

6) edit mailfilter.cf as necessary (you probaly need to fix path to mailtrainer.crm in it, its missing a slash, see http://bugs.debian.org/394476, thats at least the case if you use the debian package prior to the version stated above)

I also changed all subject_strings to empty strings, as I do not want mailreaver to insert subject prefixes into mails.

Then just do the antispam wizard and start training crm114. Its not necessary to pretrain crm114. If you want to pretrain use mailtrainer.crm. See crm114 docs for further details.

Ok, enough for now.

Regards,
Martin
Comment 1 Martin Steigerwald 2006-10-24 22:04:19 UTC
Created attachment 18252 [details]
/etc/kde3/kmail.antispamrc with CRM114 support
Comment 2 Martin Steigerwald 2006-10-24 22:12:37 UTC
Created attachment 18253 [details]
patch for kmail.antispamrc to the version from KMail 3.5.5

My changes as a patch. It should be reviewed thoroughly, cause quite some stuff
was pure guesswork.
Comment 3 Martin Steigerwald 2006-10-24 22:18:34 UTC
Of course you need to adapt the "Classify as spam" and "Classify as ham" rules manually to run the correct CRM114 mailreaver invocation, as my kmail.antispamrc does not seem to work yet setting it up for you.
Comment 4 Andreas Gungl 2006-10-25 22:16:35 UTC
Am Dienstag, 24. Oktober 2006 22:02 schrieb Martin Steigerwald:
> I set up some filters manually first, but now also started on an new entry
> for kmail.antispamrc and I thought its about time to give something back to
> the great KDE project ;-).


Hello Martin,
This is great, KMail really needs contributors.

> Its basically working but there are some rough edges I need help with. I
> attach my complete kmail.antispamrc to this bug. I paste the relevant entry
> here for writing down the issues I found. So here it is:
>
> [Spamtool #11]
> Ident=crm114
> Version=1
> Priority=20


If crm114 is really similar fast as Bogofilter, you should use a value between 
50 and 60 for the priority. The value has been introduced to place fast 
filters before the slower ones in the selection list. If the user chooses the 
top item, he gets the fastest filter. Provider sided "filters" (which produce 
headers though) like the GMX filter have a prio at about 70, they are very 
fast as they don't consume time on the client side. ;-)

> VisibleName=CRM114
> Executeable=/usr/bin/crm


You should provide something which can be run on the command line and returns 
[ $? -eq 0 ], i.e. it doesn't wait for any input etc. It usually make sense 
to assume the program in the $PATH, so you should better avoid /usr/bin.

> PipeFilterName=CRM114 Check
> PipeCmdDetect=crm -u $HOME/.crm114 mailreaver.crm


KMail assumes the started process to read the message from stdin and to write 
the modified message (added spam classification headers) to stdout. At least, 
this looks good.

> This directory in the home directory can basically be anywhere, but I think
> ~/.crm114 is a reasonable default value. crm is a language. The command
> just tells crm to load the mailreaver.crm script file and switch to
> $HOME/.crm111 before.
>
> ExecCmdSpam=crm -u $HOME/.crm114 mailreaver.crm --spam
> ExecCmdHam=crm -u $HOME/.crm114 mailreaver.crm --good


The started processes should read the message from stdin. Is this the case?

> These are similar. But unfortunately KMail doesn't set those commands for
> the "Classify as Spam" and "Classify as non Spam" filter. Each filter just
> makes the mail as spam or ham but does not execute the command I gave
> above. (I have working filters anyway, the manually created ones;-)


You need to set SupportsBayes=1 to make KMail using the ExecCmds in the 
classification filters. I don't see that seeting in the attached files, so I 
guess this is the cause for that behavior.

> URL=http://crm114.sourceforge.net
> DetectionHeader=X-CRM114-Status
> DetectionPattern=SPAM
> DetectionPattern2=UNSURE


All fine.

> crm114 supports and needs unsure. Cause ideally you only train it errors
> and mails where it asks you too.
>
> DetectionOnly=0
>
> I am not sure about this one. But I guess when it is not only able to
> detect, but also learn spam this should be set to zero. Unfortunately I do
> not find the (incomplete) documentation I read a few hours ago anymore.


No. If you want to use headers from outside (e.g. your provider does a spam 
check for you), you can skip the filter part in KMail and use only the 
provided headers for the classification. (Have a look at the GMX example.)
"0" is correct here.

> UseRegExp=0
> ScoreName=CRM114
> SupportsUnsure=1


UseRegExp=0 and SupportsUnsure=1 are fine.
The "score" fields are for the nice visualization when using the fancy headers 
style. I'm not that familiar with that part, though.

> crm114 highly depends on this. I am a bit unsure on how to best handle. I
> may have missed quite some unsure mails, but now I do it like the anti spam
> assistant suggests, I filter it do a certain folder. But then I would like
> KMail to refilter any mail I select as Ham so that I do not have to
> manually filter them to their proper destinations. Well I can do it
> manually for now and this would probably be another bug report.
>
> ScoreHeader=X-CRM114-Status
>
> crm114 places the score in the same header as where it places the spam
> status. It looks like this for good mails:
>
> X-CRM114-Status: GOOD (  11.51  )
>
> As this for spam mails:
>
> X-CRM114-Status: SPAM  ( -61.96  )
>
> And as this for mails that CRM wants to learn:
>
> X-CRM114-Status: UNSURE (  -8.61  )
>
>
> ScoreType=Decimal
> ScoreValueRegexp=([\d\.-]+)\s*
>
> I filled a regexp in there that should catch the above scores, but I have
> the feeling it is not yet working as expected. How do I tell KMail what
> score is for a spam, what score is for unsure, and what score is for a ham,
> so that KMail can choose the right coloring? I do not yet know how crm114
> scores mails anyway.


Hm, perhaps Patrick Audley (paudley at blackcat dot ca) can help you.

> I am also unsure about "SupportsBayes". The crm114 docs seem to indicate
> something that it does bayes, but not full bayes, as a quick grep seems to
> suggest.


You have to set it to "1", see above.


Okay, I hope I could help you a bit. A minor nitpick at the end, could you 
please keep the order of the entries similar to those of the other entries (a 
good orientation is the [Spamtool #1] entry)?

Best Regards,
Andreas
Comment 5 Martin Steigerwald 2006-10-26 15:37:06 UTC
Hello again,

I have changed my CRM114 kmail.antispamrc entry as you suggested... now the "Classify as spam" and "Classify as ham" filters are created as expected. I also changed the Executable entry to crm -v | grep "CRM" and reordered the entries as in entry #1.

I also found an interim solution on how to handle unsure mails with DIMAP accounts. I just mark them as spam but do not move them to the spam folder. They are just filtered as normal. Disadvantage: You have to look for mails marked as spam in all of your folders. Advantage: Well you won't get the same UNSURE mails in your unsure folder on every resync.

I also found that the score stuff only works when I change "/etc/kde3/kmail.antispamrc", when I move the config to "~/.kde/share/config" (with approbiate rights) I can use the anti spam wizard, but the score stuff won't work. Could it be that the score stuff support only looks in /etc/kde3/kmail.antispamrc... well I open another bug report for that one.

Regards,
Martin
Comment 6 Martin Steigerwald 2006-10-26 15:38:30 UTC
Created attachment 18277 [details]
updated patch for kmail.antispamrc
Comment 7 Martin Steigerwald 2006-10-26 16:05:51 UTC
I added an extra bug reporting about that the spam scoring engine does not seem to use the kmail.antispamrc from ~/.kde/share/config (see bug #136339)
Comment 8 Martin Steigerwald 2006-10-30 20:49:16 UTC
SVN commit 600501 by steigerwald:

New entry to kmail.antispamrc in order to support CRM114 in the antispam
wizard of KMail. Doesn't support the score display yet, as I did not
found the right configuration statements to set it up properly.

CCBUG: 136261


 M  +21 -1     kmail.antispamrc  


--- branches/KDE/3.5/kdepim/kmail/kmail.antispamrc #600500:600501
@@ -1,5 +1,5 @@
 [General]
-tools=10
+tools=11
 
 [Spamtool #1]
 Ident=spamassassin
@@ -210,3 +210,23 @@
 ScoreType=Decimal
 ScoreValueRegexp=([\d\.]+)\s*
 ScoreConfidenceRegexp=([\d\.]+)\s*
+
+[Spamtool #11]
+Ident=crm114
+Version=1
+Priority=65
+VisibleName=CRM114
+Executeable=crm -v | grep "CRM114"
+URL=http://crm114.sourceforge.net
+PipeFilterName=CRM114 Check
+PipeCmdDetect=crm -u $HOME/.crm114 mailreaver.crm
+ExecCmdSpam=crm -u $HOME/.crm114 mailreaver.crm --spam
+ExecCmdHam=crm -u $HOME/.crm114 mailreaver.crm --good
+DetectionHeader=X-CRM114-Status
+DetectionPattern=SPAM
+DetectionPattern2=UNSURE
+DetectionOnly=0
+UseRegExp=0
+SupportsBayes=1
+SupportsUnsure=1
+
Comment 9 Robert Jessop 2006-11-01 01:15:46 UTC
I use kmail and was considering creating this myself. I might be give a little time to help get it working well. So far there have been two problems:

if $HOME/.crm114 doesn't exist then it will fail silently instead of creating it. 

Lots of files need copying to $HOME/.crm114 and you have to create the css files (or it will fail silently again). Grabbing these automatically from /usr/share/crm114 would depend on the fact that Debian has added this directory. The default makefile in the CRM source doesn't install all this stuff at all.

At the moment it's too complicated for a normal user set-up and they'll have no idea why it isn't working.

Note that since CRM114 is a scripting language so there is the option to create a custom script specially for Kmail.
Comment 10 Andreas Gungl 2006-11-01 05:02:48 UTC
Am Mittwoch, 1. November 2006 01:15 schrieb Robert Jessop:
> At the moment it's too complicated for a normal user set-up and they'll
> have no idea why it isn't working.
>
> Note that since CRM114 is a scripting language so there is the option to
> create a custom script specially for Kmail.


You could write a wrapper script which checks for $HOME/.crm114 and creates it 
in a proper way if necessary. Then you pass the message to the actual spam 
filter tool.
Once the script is ready, all you have to do is to replace the call of crm114 
with a call of that wrapper script in the kmail.antispamrc file.

KMail already makes use of such scripts for virus checks. See kmail/avscripts/ 
in SVN (http://websvn.kde.org/branches/KDE/3.5/kdepim/kmail/avscripts).

Andreas
Comment 11 Allen Winter 2006-11-01 16:44:43 UTC
Sounds like a job for helios42!  :)
Comment 12 Martin Steigerwald 2006-11-11 11:28:12 UTC
Hello Robert, I fully agree thats its a bit complicated to set it up. Its just a first start. I at least want to do the following:

1) Proper score display. I think I have to use the KMail source to find out how it works.

2) Documentation on how to setup CRM114.

And probably the following, but I think its better to documentation the manuel setup first...

3) A assistant script which helps with CRM114 setup.

Yes, ideally there would be some kind of automatic setup, but there should be documentation as well in case the automatic setup does not work or the user wants to do it manually.

I script could work as follows:

1) Check whether ~/.crm114/mailreaver.crm exists... if not present some kdialog to inform the user that crm114 needs user speficic installation, points to the documentation and offers to setup the files for the user.

2) If user chooses to setup the files it should look for them in some standard locations (on Debian this would be /usr/share/crm, would be good to check on some other distros as well). If it cannot find them it should prompt the user for a directory that contains those files.

3) Copy the files over, if necessary adapt the configuration (the Debian CRM114 one had an error (see http://bugs.debian.org/394476) and setup the CSS files

And all of this should be made with good sanity checking. And the script has to come with KMail package in order to be useful to every KMail user.

You mentioned adapted the CRM114 setup for KMail, Robert, do you have any ideas? It would probably make sense to strip any headers KMail adds before processing the mails as the CRM114 documentation suggests that it works best with emails as received via SMTP (with no additional headers).

I will be quite busy for at least one or two weeks. If you want to start with some work Robert, please go ahead. Although I would like to write the script in Ruby, shell or Perl code might be a better idea to avoid introducing another dependency ;-).
Comment 13 Martin Steigerwald 2007-03-07 20:48:12 UTC
I have written an article for german Linux User magazine. It got translated for english Linux Magazine:

English article, available online:
http://www.linux-magazine.com/issue/77

German article, not yet online (should be after a year):
http://www.linux-user.de/ausgabe/2007/02/
Comment 14 Martin Steigerwald 2007-07-08 17:20:51 UTC
SVN commit 685299 by steigerwald:

Updated CRM114 antispam wizard entry with at least a basic boolean
spam score display. Will forward port to KDE4 and sideport to
kmail-kde-3.5.5+

CCBUG: 136261


 M  +5 -0      kmail.antispamrc  


--- branches/KDE/3.5/kdepim/kmail/kmail.antispamrc #685298:685299
@@ -229,4 +229,9 @@
 UseRegExp=0
 SupportsBayes=1
 SupportsUnsure=1
+ScoreName=CRM114
+ScoreHeader=X-CRM114-Status
+ScoreType=Bool
+ScoreValueRegexp=SPAM
+ScoreThresholdRegexp=
 
Comment 15 Martin Steigerwald 2007-07-08 18:20:19 UTC
SVN commit 685323 by steigerwald:

HOWTO for setting up a new or changing an existing KMail antispam wizard
configuration entry. This should clear up things for people you want to
"mess" with that file ;-).

CCBUG: 136261


 A             kmail.antispamrc-HOWTO  
Comment 16 Dennis Schridde 2008-12-08 01:50:20 UTC
Is this still on your TODO?
I think it would be a nice feature to be integrated in KDE 4.2. *hint, hint*
Comment 17 Martin Steigerwald 2008-12-08 19:56:46 UTC
Hi Dennis... well its a maybe, anytime item on my list. I am so occupied with other stuff. Can't promise anything at the moment. Do you need anything to get this to work on KDE 4? I am still using KDE 3.5.9/10, but I plan to update soon. Ciao, Martin
Comment 18 Martin Steigerwald 2008-12-08 19:57:17 UTC
One addition: Do you like to help?
Comment 19 Dennis Schridde 2008-12-08 21:25:52 UTC
I have not yet tried to get it to work in KDE 4.1. (Just saw some blog post claiming it would be integrated in 3.5...6 or something, which turned out to be wrong.)
I'll try to work through the tutorials I found as soon as I find the time.
Help: Yes, in theory I would like to help. Practically this most often gets a bit more complicated, since I have lot of other stuff to do (work, rl, etc). So if you have some "job" to offer, go ahead, but I cannot promise anything yet.
Comment 20 Myriam Schweingruber 2012-08-18 08:31:08 UTC
Thank you for your feature request. Kmail1 is currently unmaintained so we are closing all wishes. Please feel free to reopen a feature request for Kmail2 if it has not already been implemented.
Thank you for your understanding.
Comment 21 Luigi Toscano 2012-08-19 00:30:46 UTC
Instead of creating a new feature request, please confirm here if the wishlist is still valid for kmail2.
Comment 22 Martin Steigerwald 2012-08-19 09:48:01 UTC
Well while the CRM114 spamfilter entry works since ages, some additional wishes have not been implemented:

1) Automatic setup.

2) Better color display for spam score.

But I leave this closed. I change it to "WORKSFORME". Cause basic support is there. Extensions are better covered by additional bug reports / wishes I think.