Hi,
For some time now we have been running with
Popfile and Spamhalter in tandem with the server archiving and not
delivering and emails detected as spam by both filters. This has
worked well for us dramatically reducing the number of spams our users
have had to screen. We also run a limited content filter to detect spams
with obvious spam words in them and the output of the filter is used to
train Spamhalter. Another measure is a pre-dated filter so any message
with an age of fewer than -2 days are treated as spam and again used to
train Spamhalter. Actually I wonder why a date filter of this type is not more straightforward in Mercury as I can see no legitimate reason to pre-date messages and it is a very common spam practice, we had to implement the pre-date filter in our Pegasus Archive account.
Anyway we have recently started to get a few legitimate emails detected as false positives by both filters which is clearly a cause for concern. We have always had a few false positives from one or other filter.
Up to last week we had Spamhalter set to 'Train On Errors'. Since almost all errors were false negatives this meant that there were a lot more additions to the black database than the white one so I have set it to Train Always now in the hope that this will keep the White database more up to date. I have also just increased the spam probability setting from 70 to 75. The other classification settings are Probability for unknown Tokens = 40, Level of not Spam Preference = 3 and count of classified tokens = 20, I am not sure whether to twwek these.
Does anyone else have a similar setup and practice, if so what measures do you take to avoid false positives?
We use a fairly tight SMTP transaction filter followed by a Spamcop lookup then Spamhalter.
Train Always (for the same reason you stated)
Probability for unknown Tokens = 80 (with train always most good tokens are known if your mail isn't too diverse)
Level of not Spam Preference = 1 (good tokens are worth the same as bad rather than 3 times as much, catches a lot more spam but could increase your FP's, didn't for me but YMMV)
Count of classified tokens = 20
The transfilters & Spamcop reject > 60% of all SMTP connections.
Of the accepted mails, Spamhalter tags about 35% as spam and misses < 0.1% (just got my first one for 2 months)
Any tagged mails get moved for manual FP review and autodeleted by a batch job after 7 days.
My FP rate is about ~0.2% (1-2 per month, usually newly signed up newsletters etc,)
Thanks for that I have increased the unknown token probability to 80% but left the non spam preference at 3 as false Positives, if they happen are potentially more of a problem. The explanation of twhat each parameter does is particularly useful.
I will look in to spamcop, its not something we have tried.
I tried a couple of other DNSBL's as well (can't remember which ones now) but found the FP rate way too high,
I was tagging and reviewing for a start, but after six months not one of the Spamcop tagged mails were FP's so I switched to rejecting.
I now prefer an 'SMTP reject' wherever possible (transflt.mer is your friend :)) to an 'accept then silently delete' policy (and bounces are just idiotic) as any legit mailer will get a failure message from there own server, and at least in our situation, I will hear from them presently (watch for typo's in you transflt rules :)).
My single biggest reducer of spam is this transflt rule
D, "*.*", B-N, "554 unresolvable host name"
(see transflt.mer for a translation)
A secondary advantage of the 'D'eferred HELO processing is that it captures the MAIL FROM address, even for rejected connections, in the logs.
I tried a couple of other DNSBL's as well (can't remember which ones now) but found the FP rate way too high,
I
was tagging and reviewing for a start, but after six months not one of
the Spamcop tagged mails were FP's so I switched to rejecting.
I too went through the same process with blacklisting. I ran for over a year just tagging with the SpamCop and SpamHaus-Zen blacklists with zero false positives so I also went to rejecting. I do not reject on invalid host names in the EHLO string since there are quite a few people with a bad strings here since the new Mercury users in many cases do not understand the requirements. FWIW, almost all the spam that I receive do use a proper, if false, EHLO string.
I tried adding the blacklist entries in the Mercury SMTP server setup dialogue, more or less copying what you did but so far in 24 hours the blacklists do not seem to have trapped a single message. I have done a quick check on the spamcop site and we do seem to have received at least one from a blacklisted sender. I cannot see a problem with what I have set up, here is my MS_SPAM.MER file:
# Mercury/32 SMTP server block query definitions data file.
# Mercury/32 Mail Transport System, Copyright 1993-2006, David Harris.
Begin
Name: Spamcop
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: bl.spamcop.net
Strictness: Normal
Action: Tag
Parameter: X-Blocked see: http://spamcop.net/bl.shtml?
End
Begin
Name: PSBL
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: psbl.surriel.com
Strictness: Normal
Action: Tag
Parameter: X-Blocked: by PSBL See http://psbl.surriel.com for removal instructions
End
Begin
Name: SpamHaus-Zen
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: zen.spamhaus.org
Strictness: Range 127.0.0.2 - 127.0.0.8
Action: Tag
Parameter: X-Blocked by SpamHaus.org See http://spamhaus.org for removal instructions
End
Begin
Name: Spamhaus Zem PBL
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: zen.spamhaus.org
Strictness: Range 127.0.0.10 - 127.0.0.11
Action: Tag
Parameter: X-Blocked: by SpamHaus.org PBL See http://spamhaus.org for removal instructions
End
Any ideas?
NB Initially I did not have the ranges set for SpamHaus I changed that this morning.
I noted your setup for blacklists from another thread and used the Mercury SMTP Server setup dialogue to define Spamcop, Spamhaus and PSBL then later copied the ranges you used for Spamhaus later. I decided to tag the messages at least initially to see how effective the blacklists were. I also set a filter to file any emails which were blacklisted in my archive account. So far after just over 24 hours I am surprised to note that apprently no emails have been tagged with the defined text. I did a quick double check in our spam trap folder and discovered that some of our spam was indeed from senders blacklisted on Spamcop. I presume that fact that none of our spam appears to have been blocked means that the blacklisting is not working at all, is there something else we need to turn on to make this work or have I made a mistake? Our MS_SPAM.MER file follows.
# Mercury/32 SMTP server block query definitions data file.
# Mercury/32 Mail Transport System, Copyright 1993-2006, David Harris.
Begin
Name: Spamcop
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: bl.spamcop.net
Strictness: Normal
Action: Tag
Parameter: X-Blocked see: http://spamcop.net/bl.shtml?
End
Begin
Name: PSBL
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: psbl.surriel.com
Strictness: Normal
Action: Tag
Parameter: X-Blocked: by PSBL See http://psbl.surriel.com for removal instructions
End
Begin
Name: SpamHaus-Zen
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: zen.spamhaus.org
Strictness: Range 127.0.0.2 - 127.0.0.8
Action: Tag
Parameter: X-Blocked by SpamHaus.org See http://spamhaus.org for removal instructions
End
Begin
Name: Spamhaus Zem PBL
Enabled: Y
QueryType: Blacklist
QueryForm: Address
Hostname: zen.spamhaus.org
Strictness: Range 127.0.0.10 - 127.0.0.11
Action: Tag
Parameter: X-Blocked: by SpamHaus.org PBL See http://spamhaus.org for removal instructions
End
Thanks
Chris
Your previous draft for topic is pending
If you continue, your previous draft will be discarded.