Mercury Suggestions
Wishlist: Personal Bayes filters on Mercury

Have you had a look at PopfileD? It's a server-side daemon for mercury that uses popfile to classify mails.

I have 5 separate buckets set up; you could create a bucket for 'houses' that adds an X-Text-Classification header to the mails, then your boss could filter houses mail as spam, and you could filter it into your 'todo' folder for example.


<p>Have you had a look at PopfileD? It's a server-side daemon for mercury that uses popfile to classify mails.</p><p>I have 5 separate buckets set up; you could create a bucket for 'houses' that adds an X-Text-Classification header to the mails, then your boss could filter houses mail as spam, and you could filter it into your 'todo' folder for example.</p><p> </p>

L.S.

 As we all probably know well Bayes filters are excellent in weeding out unwanted garbage from real e-mail. I found spamhalter and popfile even better in catching viruses than several antivirus products.

It would be ideal to have a Bayes filter working on the server side, like spamhalter for Mercury and Popfile do now. Then the mail client does not have to bother with retrieving and analizing all the mail. The designated person may weed through the spam-box to catch false positives ;-)

There is just one thing with server side Bayes filters: they work on all the mail for that company/group. And the strong piont of Bayes filters is that they are trained on personal preferences. Right now I'm in the process of buying a house, so email on mortgages is fine, but not for my boss. And he is very active with stocks and bonds (and I think he needs Viagra:) so he might get mail about subjects that go to the spam folder for me.

The bottom line is that Bayes filters working on groups cannot be as effective as personal Bayes filters.

The solution would be server-side personal Bayes filtering. Popfile has promised this feature for more than 3 years now (for version 0.23, see http://sourceforge.net/docman/display_doc.php?docid=17906&group_id=63137 , im not confident it will ever deliver)
My wish list item would give Mercury/32 (yet another) unique selling proposition.

Cheers!

Loek 

<p>L.S.</p><p> As we all probably know well Bayes filters are excellent in weeding out unwanted garbage from real e-mail. I found spamhalter and popfile even better in catching viruses than several antivirus products.</p><p>It would be ideal to have a Bayes filter working on the server side, like spamhalter for Mercury and Popfile do now. Then the mail client does not have to bother with retrieving and analizing all the mail. The designated person may weed through the spam-box to catch false positives ;-)</p><p>There is just one thing with server side Bayes filters: they work on all the mail for that company/group. And the strong piont of Bayes filters is that they are trained on personal preferences. Right now I'm in the process of buying a house, so email on mortgages is fine, but not for my boss. And he is very active with stocks and bonds (and I think he needs Viagra:) so he might get mail about subjects that go to the spam folder for me.</p><p>The bottom line is that Bayes filters working on groups cannot be as effective as personal Bayes filters.</p><p>The solution would be server-side personal Bayes filtering. Popfile has promised this feature for more than 3 years now (for version 0.23, see http://sourceforge.net/docman/display_doc.php?docid=17906&group_id=63137 , im not confident it will ever deliver) My wish list item would give Mercury/32 (yet another) unique selling proposition.</p><p>Cheers!</p><p>Loek </p>

Having had run a large Mercury/32 system once, I can see arguments for not doing this too.

In a small system the server could keep up with the load of personal filters but in a large system, you would have to have one server for delivery and filters and another server for the rest of the functions to keep up with the load.

I may be totally off base, but its really better performance wise to keep personal filtering at the client level and do more broad stroke filtering at the server level.   Its more practical to manage also.   That being said,  I think you will love the way the graywall feature works in Mercury 5.01.  Just graywalling knocked spam 90% on my Mercury/32 system.

 


 

<p>Having had run a large Mercury/32 system once, I can see arguments for not doing this too.</p><p>In a small system the server could keep up with the load of personal filters but in a large system, you would have to have one server for delivery and filters and another server for the rest of the functions to keep up with the load.</p><p>I may be totally off base, but its really better performance wise to keep personal filtering at the client level and do more broad stroke filtering at the server level.   Its more practical to manage also.   That being said,  I think you will love the way the graywall feature works in Mercury 5.01.  Just graywalling knocked spam 90% on my Mercury/32 system.</p><p> </p><p>  </p>

Hi Larry,

 With al due respect: your arguments are not valid, except for a single point.

  • As spamhalter is already in place it does not change much performancewise whether one evaluates against a single corpus of spam/ham tokens or against a personalized corpus.
  • As with nearly all options in Mercury/32 Bayes content filtering is optional, if you can't afford the computational power then just don't use it.  Any  personal Bayes filtering should be optional too.

Now there is one point I cannot oversee and that is the performance hit it would imply to load/unload a personal corpus in and out memory. With a big system (hundreds or thousends of *active* users) it might be unfeasable to keep these corpuses in memory upposed to the possibility to keep a single corporate corpus in memory.
Maybe DH or Lukas can shed a light on this.

Cheers!

Loek Gijben 


 


 

<p>Hi Larry,</p><p> With al due respect: your arguments are not valid, except for a single point.</p><ul><li>As spamhalter is already in place it does not change much performancewise whether one evaluates against a single corpus of spam/ham tokens or against a personalized corpus.</li><li>As with nearly all options in Mercury/32 Bayes content filtering is optional, if you can't afford the computational power then just don't use it.  Any  personal Bayes filtering should be optional too.</li></ul><p>Now there is one point I cannot oversee and that is the performance hit it would imply to load/unload a personal corpus in and out memory. With a big system (hundreds or thousends of *active* users) it might be unfeasable to keep these corpuses in memory upposed to the possibility to keep a single corporate corpus in memory. Maybe DH or Lukas can shed a light on this.</p><p>Cheers!</p><p>Loek Gijben </p><p>   </p><p>  </p>

[quote user="Loek Gijben"]

The solution would be server-side personal Bayes filtering. Popfile has promised this feature for more than 3 years now (for version 0.23, see http://sourceforge.net/docman/display_doc.php?docid=17906&group_id=63137 , im not confident it will ever deliver)
My wish list item would give Mercury/32 (yet another) unique selling proposition.

[/quote]

Just putting in my 10c worth (New Zealand ditched its 1c and 2c pieces nearly a decade ago, and its 5c pieces last year, so even the most worthless thought now can't cost less than 10c).

I think this is a good idea in principle, although I also agree with Larry that it's potentially very load-inducing (regular loading and unloading of the sqlite databases for each delivery). Nonetheless, as a configurable option, I believe it would definitely have a place.

Unfortunately, it's not something I can do. I don't understand the maths and stats behind Bayesian analysis enough to code something like this, nor would I ever want to, given how good Lukas is at it...  It wouldn't be difficult to do as a Daemon, though. I don't know if Lukas is taking part here yet or not, but he's the most likely source of anything that might fit this bill.

Cheers!

-- David --

[quote user="Loek Gijben"]<p>The solution would be server-side personal Bayes filtering. Popfile has promised this feature for more than 3 years now (for version 0.23, see http://sourceforge.net/docman/display_doc.php?docid=17906&group_id=63137 , im not confident it will ever deliver) My wish list item would give Mercury/32 (yet another) unique selling proposition.</p>[/quote] Just putting in my 10c worth (New Zealand ditched its 1c and 2c pieces nearly a decade ago, and its 5c pieces last year, so even the most worthless thought now can't cost less than 10c). I think this is a good idea in principle, although I also agree with Larry that it's potentially very load-inducing (regular loading and unloading of the sqlite databases for each delivery). Nonetheless, as a configurable option, I believe it would definitely have a place. Unfortunately, it's not something I can do. I don't understand the maths and stats behind Bayesian analysis enough to code something like this, nor would I ever want to, given how good Lukas is at it...  It wouldn't be difficult to do as a Daemon, though. I don't know if Lukas is taking part here yet or not, but he's the most likely source of anything that might fit this bill. Cheers! -- David --
live preview
enter atleast 10 characters
WARNING: You mentioned %MENTIONS%, but they cannot see this message and will not be notified
Saving...
Saved
With selected deselect posts show selected posts
All posts under this topic will be deleted ?
Pending draft ... Click to resume editing
Discard draft