Q: I need a way to easily filter spam at the mail host rather than only at the mail client. Does Mercury/32 provide such a feature?
A: Yes, Mercury/32 includes a Content Control feature that can help you easily filter out spam and other undesirable content (e.g. viruses, urban legends, and virus warning hoaxes). This support document will help you better understand how Mercury/32's content control feature works and how to properly use it.
Looking at the Mercury/32 message processing flowchart http://kbase.pmail.gen.nz/viewfull.cfm?ObjectID=5E94847C-CEA4-44A1-824645C44185FE40, you'll notice that Content Control filtering takes place early on in the message processing at step 5, before Global Filtering takes place. Only Mercury/32 Deamons and pre-filter Policies get processed beforehand.
When the Content Control step happens, Mercury/32 processes each Content Control Set in order from top to bottom. You can have many different sets defined for different purposes. For instance, you may have a top-level set as a virus and hoax filter while having a second set that checks for spam. The top set would simply search for exact matches of known subject and body content used by popular viruses, such as the Klez variants, as well as well-known virus hoaxes, such as the jdbgmgr.exe hoax. The second set would search for words or phrases or e-mail addresses known to be indicative of spam.
When each content control set gets processed in turn, the following events happen:
If a content control set is marked as disabled, then it is bypassed and processing continues with the next content control set.
If the current control control set is not disabled, then the message is checked to determine whether or not it is applicable to this content control set's message origination setting. A content control set can be configured to only apply against mail originating locally, non-locally, or both. For purposes of defining local origination, Mercury checks whether or not the sender's address can be considered local (meaning that it has a local mailbox associated with it on the Mercury system or is defined within a Mercury alias or synonym). If the message is not considered applicable for this content control set, the next content control set takes over.
Note: At the time of this writing, the latest release of Mercury/32 (currently v4.01) incorrectly considers all incoming mail via the MercuryD POP3 client to be originating from a local address. Thus if you obtain your incoming e-mail via MercuryD's POP3 client module, then you will need to configure your content control sets to be applicable to all incoming mail, not only from non-local addresses.
The content control set's whitelist (if specified) is searched first to see if the sender of this message is in the whitelist. If specified, be sure to type in a full directory path and filename for the whitelist text file.
The whitelist can contain wildcards (e.g. *@mydomain.com, *@*.mydomain.com, *support*@*). If the sender matches a whitelist entry, no further content control processing is performed against this message for this specific content control set (the next content control set becomes active at this point).
If no match was found above, then the content control set's blacklist (if specified) is searched to see if the sender of this message is on the blacklist. If specified, be sure to type in a full directory path and filename for the blacklist text file. The blacklist can contain wildcards, just like with the whitelist above. If the sender matches a blacklist entry, the message automatically gets treated as undesirable (the filter rules are skipped) and any specified action for this content control set is performed before the next content control set becomes active.
If no match was found above, then the content control set's filters (if specified) are processed to see if the message triggers any of the filters. If specified, be sure to type in a full directory path and filename for the content filter text file. If any filters are triggered, the message's weight score is increased by each matching filter's weight parameter. e.g. 10 + 30 + 40 = 80
The filter rules are very finniky and you can end up disabling a lot of rules in the set if you have a syntax error in a filter statement higher up in the filter rule file. You can use the Check Syntax button within Mercury/32's built-in rule editor dialog to quickly have Mercury/32 scan the filter rule file for any errors.
Note: At the time of this writing, the built-in rule editor dialog in the latest release of Mercury/32 (currently v4.01) has a 32KB text buffer limit. However you can use an external pure-text editor, like WordPad, if your filter rule file gets too big, but you will no longer be able to use the Check Syntax function.
Note: Verions of Mercury/32 older than v4.0 do not support decoding of BASE64-encoded content within e-mail messages. Thus, in older versions of Mercury/32, it is possible for obvious spam messages to sneak past the message body filters if the spam message is BASE64-encoded. Additionally, a filter rule that is searching for a single small word may find a hit within a large harmless BASE64-encoded attachment to a valid e-mail message and thus cause a false positive against the message. Adding a leading and/or trailing space to such small words in your filters will help reduce the chance of such false positives occurring.
Once the filter rules have finished processing against the message, the message's resulting weight score is compared to the specified weight threshold defined in this content control set. If the message's weighted score is greater or equal to the set's defined weight threshold, then any specified action for this content control set is performed before the next content control set becomes active.
Tip: By using different weights in your filters effectively, you can reduce the number of false positives produced by your filters. For instance, some words or phrases by themselves do not necessarily indicate spam unless other similar words or phrases are also found in the message. Additionally, you can use negative weights in filter rules to lower a message's weight score when certain words or phrases are found that most likely indicate a valid message and not a spam message. For example, you can use this technique to allow humor messages to score lower due to a negative weight by looking for "humor" or "joke" or a similar word within the subject header or in one of the standard sender fields (From, Reply-to, and Return-path).
The action specified in the content control set is performed against the message only if either the message's sender was found in the blacklist or the message's weighted score is greater than or equal to the content control set's weight threshold. Otherwise, processing continues with the next content control set. A content control set can have only one action from the following list:
Take no further action: Nothing is done to the message, but the corresponding System Statistics counters are incremented.
Add an identifying header: Adds the specified header and value to the message (e.g. X-SPAM: Yes); if no header is specified, Mercury will add an "X-UC-Weight" header to the message, and will assign the weight of the message as the value of this added header. We recommend that each content control set use a different header so that you know which content control set(s) triggered an action for a message. You can use filters in Mercury/32 (or Regular Expression filters in Pegasus Mail) to search for these special headers and process the messages accordingly. Using a combination of identifying headers and filter rules gives you the most control over the possible actions that can be applied to a message, even allowing you to perform multiple actions against the message.
Copy the message to another address: A copy of the message is sent to the e-mail address you specify in the parameter field. The original message will not be altered in any way.
Forward the message then delete it: The message is redirected to the e-mail address you specify in the parameter field. This action will cause all content control processing to terminate for the original message because it will be deleted from the message queue.
Move the message to a directory as a file: The message is saved as a file (using a random filename with no filename extension) in the directory specified in the parameter field. This action will cause all content control processing to terminate for the message because it will be deleted from the message queue.
Delete the message: This action will delete the message permanently and all content control processing for this message will be terminated.
The following is an addendum to and re-emphasizes some major points from the Help file in Mercury/32 concerning the syntax and usage of content control rules:
Regular Expressions (Regex) are only available in rules that use the "MATCHES" clause, not the "CONTAINS" clause. All RegEx metacharacters will be treated as regular text characters within a rule that uses the "CONTAINS" clause.
Rules using the "MATCHES" clause must match the whole text of the subject, header, sender, or body of the message (depending upon what is actually being checked); otherwise, use "*" characters before and after the string being searched in order to search for only a subset of the whole text. e.g. "*day job*"
If this search text should only be found at the front of the text, leave off the "*" at the front of the search text. e.g. "day job*"
If this search text should only be found at the end of the text, leave off the "*" at the end of the search text. e.g. "*day job"
If this search text should be an exact match of the subject text or whatever, leave off the "*" at both ends of the search text. e.g. "day job"
Since "*" and "?" are all treated as special regex metacharacters in rules containing a "MATCHES" clause, you must preface these characters with a "/" character if you want them to be treated literally. e.g.
"COPY /*.AB/? TESTDIR"
Since "/" and "[" are treated as a special regex qualifier character (as shown above) in rules containing a "MATCHES" clause, you must enclose the "/" and "[" in square brackets if you want the "/" and "[" characters to be treated literally. e.g. "his[/]her" "[[]surrounded by brackets]"
This requirement also applies to "+". e.g. "A[+]B=C"
Since the double-quote character (") encloses the text strings used in all content control rules, you must preface a double-quote character with a "\" character if you want the double-quote character it to be treated as part of text string. e.g. "charset=\"\""
Since "\" is treated as a special text string processing character (as shown above) in all content control rules, you must use two "\" characters if you want one "\" character to be treated literally. e.g. "c:\\test.txt"
Note: Proper handling of the /w and /W regex metacharacters has been fixed as of Mercury/32 v4.01a.
<div class="BodyFull">
<i>Q: I need a way to easily filter spam at the mail host rather than only at the mail client. Does Mercury/32 provide such a feature?</i>
<p>
A: Yes, Mercury/32 includes a Content Control feature that can help you easily filter out spam and other undesirable content (e.g. viruses, urban legends, and virus warning hoaxes). This support document will help you better understand how Mercury/32's content control feature works and how to properly use it.</p>
<p>
Looking at the Mercury/32 message processing flowchart <a href="http://kbase.pmail.gen.nz/viewfull.cfm?ObjectID=5E94847C-CEA4-44A1-824645C44185FE40" mce_href="http://kbase.pmail.gen.nz/viewfull.cfm?ObjectID=5E94847C-CEA4-44A1-824645C44185FE40">http://kbase.pmail.gen.nz/viewfull.cfm?ObjectID=5E94847C-CEA4-44A1-824645C44185FE40</a>, you'll notice that Content Control filtering takes place early on in the message processing at step 5, before Global Filtering takes place. Only Mercury/32 Deamons and pre-filter Policies get processed beforehand.</p>
<p>
When the Content Control step happens, Mercury/32 processes each Content Control Set in order from top to bottom. You can have many different sets defined for different purposes. For instance, you may have a top-level set as a virus and hoax filter while having a second set that checks for spam. The top set would simply search for exact matches of known subject and body content used by popular viruses, such as the Klez variants, as well as well-known virus hoaxes, such as the jdbgmgr.exe hoax. The second set would search for words or phrases or e-mail addresses known to be indicative of spam.</p>
<p>
When each content control set gets processed in turn, the following events happen:</p>
<ol>
<li><p>If a content control set is marked as <i>disabled</i>, then it is bypassed and processing continues with the next content control set.</p></li>
<li><p>If the current control control set is not disabled, then the message is checked to determine whether or not it is applicable to this content control set's message origination setting. A content control set can be configured to only apply against mail originating locally, non-locally, or both. For purposes of defining local origination, Mercury checks whether or not the sender's address can be considered local (meaning that it has a local mailbox associated with it on the Mercury system or is defined within a Mercury alias or synonym). If the message is not considered applicable for this content control set, the next content control set takes over.</p>
<p>
<b>Note:</b> At the time of this writing, the latest release of Mercury/32 (currently v4.01) incorrectly considers all incoming mail via the MercuryD POP3 client to be originating from a local address. Thus if you obtain your incoming e-mail via MercuryD's POP3 client module, then you will need to configure your content control sets to be applicable to <b>all</b> incoming mail, not only from non-local addresses.</p></li>
<li><p>The content control set's whitelist (if specified) is searched first to see if the sender of this message is in the whitelist. If specified, be sure to type in a full directory path and filename for the whitelist text file.
The whitelist can contain wildcards (e.g. *@mydomain.com, *@*.mydomain.com, *support*@*). If the sender matches a whitelist entry, no further content control processing is performed against this message for this specific content control set (the next content control set becomes active at this point).</p></li>
<li><p>If no match was found above, then the content control set's blacklist (if specified) is searched to see if the sender of this message is on the blacklist. If specified, be sure to type in a full directory path and filename for the blacklist text file. The blacklist can contain wildcards, just like with the whitelist above. If the sender matches a blacklist entry, the message automatically gets treated as undesirable (the filter rules are skipped) and any specified action for this content control set is performed before the next content control set becomes active.</p></li>
<li><p>If no match was found above, then the content control set's filters (if specified) are processed to see if the message triggers any of the filters. If specified, be sure to type in a full directory path and filename for the content filter text file. If any filters are triggered, the message's weight score is increased by each matching filter's weight parameter. e.g. 10 + 30 + 40 = 80
The filter rules are very finniky and you can end up disabling a lot of rules in the set if you have a syntax error in a filter statement higher up in the filter rule file. You can use the Check Syntax button within Mercury/32's built-in rule editor dialog to quickly have Mercury/32 scan the filter rule file for any errors.</p>
<p>
<b>Note:</b> At the time of this writing, the built-in rule editor dialog in the latest release of Mercury/32 (currently v4.01) has a 32KB text buffer limit. However you can use an external pure-text editor, like WordPad, if your filter rule file gets too big, but you will no longer be able to use the Check Syntax function.</p>
<p>
<b>Note:</b> Verions of Mercury/32 older than v4.0 do not support decoding of BASE64-encoded content within e-mail messages. Thus, in older versions of Mercury/32, it is possible for obvious spam messages to sneak past the message body filters if the spam message is BASE64-encoded. Additionally, a filter rule that is searching for a single small word may find a hit within a large harmless BASE64-encoded attachment to a valid e-mail message and thus cause a false positive against the message. Adding a leading and/or trailing space to such small words in your filters will help reduce the chance of such false positives occurring.</p></li>
<li><p>Once the filter rules have finished processing against the message, the message's resulting weight score is compared to the specified weight threshold defined in this content control set. If the message's weighted score is greater or equal to the set's defined weight threshold, then any specified action for this content control set is performed before the next content control set becomes active.</p></li>
</ol>
<p><b>Tip:</b> By using different weights in your filters effectively, you can reduce the number of false positives produced by your filters. For instance, some words or phrases by themselves do not necessarily indicate spam unless other similar words or phrases are also found in the message. Additionally, you can use negative weights in filter rules to lower a message's weight score when certain words or phrases are found that most likely indicate a valid message and not a spam message. For example, you can use this technique to allow humor messages to score lower due to a negative weight by looking for "humor" or "joke" or a similar word within the subject header or in one of the standard sender fields (From, Reply-to, and Return-path).</p>
<p>
The action specified in the content control set is performed against the message only if either the message's sender was found in the blacklist or the message's weighted score is greater than or equal to the content control set's weight threshold. Otherwise, processing continues with the next content control set. A content control set can have only one action from the following list:</p>
<ul>
<li><p>Take no further action: Nothing is done to the message, but the corresponding System Statistics counters are incremented.</p></li>
<li><p>Add an identifying header: Adds the specified header and value to the message (e.g. X-SPAM: Yes); if no header is specified, Mercury will add an "X-UC-Weight" header to the message, and will assign the weight of the message as the value of this added header. We recommend that each content control set use a different header so that you know which content control set(s) triggered an action for a message. You can use filters in Mercury/32 (or Regular Expression filters in Pegasus Mail) to search for these special headers and process the messages accordingly. Using a combination of identifying headers and filter rules gives you the most control over the possible actions that can be applied to a message, even allowing you to perform multiple actions against the message.</p></li>
<li><p>Copy the message to another address: A copy of the message is sent to the e-mail address you specify in the parameter field. The original message will not be altered in any way.</p></li>
<li><p>Forward the message then delete it: The message is redirected to the e-mail address you specify in the parameter field. This action will cause all content control processing to terminate for the original message because it will be deleted from the message queue.</p></li>
<li><p>Move the message to a directory as a file: The message is saved as a file (using a random filename with no filename extension) in the directory specified in the parameter field. This action will cause all content control processing to terminate for the message because it will be deleted from the message queue.</p></li>
<li><p>Delete the message: This action will delete the message permanently and all content control processing for this message will be terminated.</p></li>
</ul>
<p>
The following is an addendum to and re-emphasizes some major points from the Help file in Mercury/32 concerning the syntax and usage of content control rules:</p>
<ul>
<li><p>Regular Expressions (Regex) are <b><i>only</i></b> available in rules that use the "<b>MATCHES</b>" clause, not the "<b>CONTAINS</b>" clause. All RegEx metacharacters will be treated as regular text characters within a rule that uses the "<b>CONTAINS</b>" clause.</p></li>
<li><p>Rules using the "<b>MATCHES</b>" clause must match the whole text of the subject, header, sender, or body of the message (depending upon what is actually being checked); otherwise, use "*" characters before and after the string being searched in order to search for only a subset of the whole text. e.g. "*day job*"
If this search text should only be found at the front of the text, leave off the "*" at the front of the search text. e.g. "day job*"
If this search text should only be found at the end of the text, leave off the "*" at the end of the search text. e.g. "*day job"
If this search text should be an exact match of the subject text or whatever, leave off the "*" at both ends of the search text. e.g. "day job"</p></li>
<li><p>Since "*" and "?" are all treated as special regex metacharacters in rules containing a "<b>MATCHES</b>" clause, you must preface these characters with a "/" character if you want them to be treated literally. e.g.
"COPY /*.AB/? TESTDIR"</p></li>
<li><p>Since "/" and "[" are treated as a special regex qualifier character (as shown above) in rules containing a "<b>MATCHES</b>" clause, you must enclose the "/" and "[" in square brackets if you want the "/" and "[" characters to be treated literally. e.g. "his[/]her" "[[]surrounded by brackets]"
This requirement also applies to "+". e.g. "A[+]B=C"</p></li>
<li><p>Since the double-quote character (") encloses the text strings used in <b><i>all</i></b> content control rules, you must preface a double-quote character with a "\" character if you want the double-quote character it to be treated as part of text string. e.g. "charset=\"\""</p></li>
<li><p>Since "\" is treated as a special text string processing character (as shown above) in <b><i>all</i></b> content control rules, you must use two "\" characters if you want one "\" character to be treated literally. e.g. "c:\\test.txt"</p></li>
</ul>
<b>Note:</b> Proper handling of the /w and /W regex metacharacters has been fixed as of Mercury/32 v4.01a.
</div>