Michael -- IERenderer's Homepage PGP Key ID (RSA 2048): 0xC45D831B S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C
[quote user="Reed_D"]E.g., if I see "su*Order" does the "su*" possibly signify the word being in the subject?[/quote]
Without knowing more details about Spamhalter I'm pretty sure that su* and fr* signify subject and from headers.
Michael -- IERenderer's Homepage PGP Key ID (RSA 2048): 0xC45D831B S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C
In the "A guide to Pegasus Mail filenames and file-extensions" article, the file WI_SPH.INI is described as "This file holds the information used by Pegasus Mail's Spamhalter add in and can be edited using Notepad. More information is in the file."
But there is no information in the file, and the Spamhalter configuration dialog box for the plugin only exposes a few of them.
And the definitions in the WI_SPH.INI file don't match the ones in the SpamHalter PDF documentation.
Has anyone worked out the mapping between these?
Lukas is the real authority on this, but this is my guess:
bayspamprob is the same as defined in Spamhalter.pdf, but only the first digit is saved, so 8=80%
baynospamboost, trainalways, enabled, and debug are all the same.
ForcedWrites/bayForcedWrites, WALmode/bayWALmode, MaxTokens/bayClasifyMaxTokens, SizeLimit/bayMaxLength are equivalent parameters.
I've no idea what CustomHeaders does.
I wonder if anybody has tried tinkering with the settings to improve the detection accuracy?
E.g., I'm fairly sure that yesterday and today, I classifed messages with "ADT" in the subject line as spam, but I don't find "ADT" in the word database.
[quote user="Reed_D"] E.g., I'm fairly sure that yesterday and today, I classifed messages with "ADT" in the subject line as spam, but I don't find "ADT" in the word database.[/quote]
This depends certainly on the encoding applied to the respective subject, and since spam usually tries to hide from filters it may use one even if not really required. If you take a look at the raw text of the respective messages you may actually not find ADT literally in these but some kind of encoded word (or the whole subject) replacing it.
Michael -- IERenderer's Homepage PGP Key ID (RSA 2048): 0xC45D831B S/MIME Fingerprint: 94C6B471 0C623088 A5B27701 742B8666 3B7E657C
[quote user="idw"]This depends certainly on the encoding applied to the respective subject, and since spam usually tries to hide from filters it may use one even if not really required. If you take a look at the raw text of the respective messages you may actually not find ADT literally in these but some kind of encoded word (or the whole subject) replacing it.[/quote]
Right. I haven't studied the details of spam-ology for many years, but realize that a message like that might be packed with "good" type words, too, that throw off the balance.
I don't have time to get too much into the details now of trying to analyze what or how Spamhalter is doing.
But I will switch back from Train on Errors from Train Always, to see if my exerting some control is better.
What I would be hoping to to by tinkering with the parameters is make it remember more longer. The manual was written long ago when they were trying to conserve disk space and processing power. Now we don't have those concerns.
[quote user="Reed_D"]but I don't find "ADT" in the word database.[/quote]
Speaking of the word database, is there published info on interpreting the words there?
E.g., if I see "su*Order" does the "su*" possibly signify the word being in the subject?
[quote user="Reed_D"]I classifed messages with "ADT" in the subject line as spam, but I don't find "ADT" in the word database.[/quote]
OTOH, if I'm that particular, should I start digging into the Content Control method, and learn to edit "The Content Control editor's "Message Tests" held in spambust.dat?
Do people already share more modernized versions of that file, or discuss what rules they've added to it?
I see over on "Mercury Community Support » Content Control arithmetics" something relatively recent, from 2016:
http://community.pmail.com/forums/thread/46534.aspx
Much older and detailled is 2008-9's "Pegasus Mail Community Support » Using SpamHalter and Content Control in concert" at:
http://community.pmail.com/forums/thread/6667.aspx
[quote user="Reed_D"]But I will switch back from Train on Errors from Train Always, to see if my exerting some control is better.[/quote]
A practical issue with that, though, is if I do want to remind it of non-spam messages, I run into the self-imposed error message:
"The message has been whitelisted - the address of its sender appears in your whitelist.
As a a result, it has not been submited to either the Content Control or Spamhalter for classification."
So I'd need to do the temporary step of removing the address from the whiltelist, re-train on it, then put the address back in the whitelist.
But in the meanwhile, I re-train on the spam messages.
[quote user="Reed_D"]Speaking of the word database, is there published info on interpreting the words there?
E.g., if I see "su*Order" does the "su*" possibly signify the word being in the subject?[/quote]
I uploaded a screenshot of a message (before reclassifying) that has See Better Today Without Glasses of Surgery!" and can see how those words have "su*" prepended in Spamhalter's screen, and they are found as the same in the word database.
Your previous draft for topic is pending
If you continue, your previous draft will be discarded.