Thank you Brian. I installed the provided version of spamhalter, exactly following the PDF instructions.
After that, I did some training as follows:
I exported about 500 spam mails out of my existing junk folder in a raw text format for training spam using the SpamhalterTools.exe program. I exported about 5000 mails out of some user accounts in raw text format and trained for nospam. My words4 database is now 2.6 MB large and contains about 50000 words.
Spamhalter is now active (parallel to Mercury's spamfilter). For my understanding, spamhalter does no filtering at all, it only calculates a spam probability, and, if this value is greater than the threshold of 80% (default), it adds the "Spam detected!" flag to the headers. So I would have to create a new Mercury rule to check for this flag. In the meantime, I watch spamhalter working...
However, the spam probability is always at 0.0% for all mails, that mercury filtered out using its existing ruleset. The same kind of messages I used for training.
I am getting the following debug results for example:
X-MERCURY-SPAMHALTER: Passed through antiSPAM test by Spamhalter 4.6.1.433 on xxxxxx.de (282)
X-MERCURY-SPAMHALTER: probability - 0.0%
X-MERCURY-SPAMHALTER: Debug - Arial 0.0008802816901
X-MERCURY-SPAMHALTER: Debug - FONT 0.0015822784810
X-MERCURY-SPAMHALTER: Debug - charset 0.9980079681275
X-MERCURY-SPAMHALTER: Debug - Sales 0.0021449833654
X-MERCURY-SPAMHALTER: Debug - regards 0.0021625915791
X-MERCURY-SPAMHALTER: Debug - Best 0.0022276066906
X-MERCURY-SPAMHALTER: Debug - delivery 0.0029112081514
X-MERCURY-SPAMHALTER: Debug - information 0.0038537549407
X-MERCURY-SPAMHALTER: Debug - send 0.0042447824549
X-MERCURY-SPAMHALTER: Debug - mail 0.0044592113316
X-MERCURY-SPAMHALTER: Debug - Hello 0.0058968058968
X-MERCURY-SPAMHALTER: Debug - This 0.0059199904323
X-MERCURY-SPAMHALTER: Debug - payment 0.0060741687980
X-MERCURY-SPAMHALTER: Debug - DIV 0.0068493150685
X-MERCURY-SPAMHALTER: Debug - price 0.0075819672131
X-MERCURY-SPAMHALTER: Debug - please 0.0084932715641
X-MERCURY-SPAMHALTER: Debug - www 0.0086485542063
X-MERCURY-SPAMHALTER: Debug - utf-8 0.9909090909091
X-MERCURY-SPAMHALTER: Debug - http-equiv 0.9903846153846
X-MERCURY-SPAMHALTER: Debug - the 0.0097377577470
X-MERCURY-SPAMHALTER: Debug - ... 0.0000000000000
What am I doing wrong? I had one or two out of 100 today, that met ~10%. Should I switch to the Train Always method? And yes, I am already doing corrections and get "C" lines in my spamhalter log.
Thank you,
Bernward
<p>Thank you Brian. I installed the provided version of spamhalter, exactly following the PDF instructions.</p><p>After that, I did some training as follows:</p><p>I exported about 500 spam mails out of my existing junk folder in a raw text format for training spam using the SpamhalterTools.exe program. I exported about 5000 mails out of some user accounts in raw text format and trained for nospam. My words4 database is now 2.6 MB large and contains about 50000 words.</p><p>Spamhalter is now active (parallel to Mercury's spamfilter). For my understanding, spamhalter does no filtering at all, it only calculates a spam probability, and, if this value is greater than the threshold of 80% (default), it adds the "Spam detected!" flag to the headers. So I would have to create a new Mercury rule to check for this flag. In the meantime, I watch spamhalter working...</p><p>However, the spam probability is always at 0.0% for all mails, that mercury filtered out using its existing ruleset. The same kind of messages I used for training.
</p><p>I am getting the following debug results for example:</p><pre>X-MERCURY-SPAMHALTER: Passed through antiSPAM test by Spamhalter 4.6.1.433 on xxxxxx.de (282)
X-MERCURY-SPAMHALTER: probability - 0.0%
X-MERCURY-SPAMHALTER: Debug - Arial 0.0008802816901
X-MERCURY-SPAMHALTER: Debug - FONT 0.0015822784810
X-MERCURY-SPAMHALTER: Debug - charset 0.9980079681275
X-MERCURY-SPAMHALTER: Debug - Sales 0.0021449833654
X-MERCURY-SPAMHALTER: Debug - regards 0.0021625915791
X-MERCURY-SPAMHALTER: Debug - Best 0.0022276066906
X-MERCURY-SPAMHALTER: Debug - delivery 0.0029112081514
X-MERCURY-SPAMHALTER: Debug - information 0.0038537549407
X-MERCURY-SPAMHALTER: Debug - send 0.0042447824549
X-MERCURY-SPAMHALTER: Debug - mail 0.0044592113316
X-MERCURY-SPAMHALTER: Debug - Hello 0.0058968058968
X-MERCURY-SPAMHALTER: Debug - This 0.0059199904323
X-MERCURY-SPAMHALTER: Debug - payment 0.0060741687980
X-MERCURY-SPAMHALTER: Debug - DIV 0.0068493150685
X-MERCURY-SPAMHALTER: Debug - price 0.0075819672131
X-MERCURY-SPAMHALTER: Debug - please 0.0084932715641
X-MERCURY-SPAMHALTER: Debug - www 0.0086485542063
X-MERCURY-SPAMHALTER: Debug - utf-8 0.9909090909091
X-MERCURY-SPAMHALTER: Debug - http-equiv 0.9903846153846
X-MERCURY-SPAMHALTER: Debug - the 0.0097377577470
X-MERCURY-SPAMHALTER: Debug - ... 0.0000000000000</pre><pre>&nbsp;</pre><pre>What am I doing wrong? I had one or two out of 100 today, that met ~10%. Should I switch to the Train Always method? And yes, I am already doing corrections and get "C" lines in my spamhalter log.</pre><pre>Thank you,</pre><pre>Bernward
</pre>