Mercury retries when sending to a Grey Listing receiver

Rolf Lindby

posted Apr 28 '10 at 1:03 am

Mercury help recommends using 60 minutes between retries, or 15 minutes with progressive backoff. As Thomas already pointed out it defaults to 30 minutes. With so many servers using greywalling as there presently is I'll have to agree that this recommendation is starting to feel outdated.

A suitable solution should probably retry quickly a few times to get through greywalling and then increase the intervals to handle offline or overloaded servers in a suitable way, keeping on retrying for at least 48 hours.

/Rolf

Mercury help recommends using 60 minutes between retries, or 15 minutes with progressive backoff. As Thomas already pointed out it defaults to 30 minutes. With so many servers using greywalling as there presently is I'll have to agree that this recommendation is starting to feel outdated.A suitable solution should probably retry quickly a few times to get through greywalling and then increase the intervals to handle offline or overloaded servers in a suitable way, keeping on retrying for at least 48 hours. /Rolf

subelman

posted Apr 27 '10 at 5:37 am

I’ve been looking at the way Mercury handles retries when sending mail. The current procedure involves setting a “Basic minimum period between queue retries”, and using a “progressive backoff algorithm” to calculate job retries.

Assuming you set the minimum period to N minutes, the algorithm will do ten tries, one every N minutes, then 10 tries, one every 1.5N minutes, and continue to increase the retry interval to 2N, 2.5N, 3N, 3.5N, 4N, 5N, 6N, every time doing ten retries at the current value. It then increases to 7N and uses that up to the maximum number of retries.

This worked well in the days when you were retrying because of actual failures, but does not work quite as well when sending mail to a server that uses Grey listing. If you set your N too low, you end up hammering a receiving server that is actually down. If you set your N too high, your email to recipients that use Grey Listing gets delayed, and you also run the risk of getting it rejected: your first delivery attempt gets refused, and by the time you do your second attempt, the receiving server has already deleted your first attempt from its gray list and you are rejected again. Don’t laugh – this happened with one of our recipients, and we ended on their black list for repeated failed attempts.

Even with a reasonable N, your email can get delayed quite a bit: we have a recipient that has two servers in their MX record. Our first try hits their server #1 and gets rejected by their Grey listing. Our second try hits their server #2 and gets rejected again (yes, I know their two servers should share data, but they don’t). The third try hits either server #1 or #2 and is accepted, for a total delay of 30 minutes (we use N=15) – all the while we are on the phone with them telling them their data will arrive “any minute now”.

In an attempt to see how other servers handle this, I configured my Mercury server to reject all inbound e-mail containing a specific nonsense subject, and after sending myself email with that subject from a Gmail account and from a Microsoft Exchange account, I sat back and watched the Mercury logs.

Exchange has the simplest algorithm. It retried after 1 minute, then waited another 2 minutes to try a second time, then waited 6 minutes for a third retry, and another 20 minutes for a fourth one. After that, it kept trying once every 60 minutes and gave up after a total of about 72 hours and 77 retries. Notice that they start very aggressively, and then back off rather quickly to a reasonable 1 hour between retries.

Gmail sent the first retry 7 minutes after the initial rejection, the second after an additional 21 minutes, and the third one 27 minutes after that. It then started sending at what seem like random intervals selected with an ever increasing mean value. The times between tries are all over the place, but if you use exponential smoothing to look at them, you see a clearly increasing trend. The first couple of retry intervals were around one hour, while the final ones were in the 6 hour vicinity. It also gave up after 72 hours, but did only 27 retries in all. I have the data if anyone is interested.

Obviously, I think that the algorithm that Mercury uses needs some tweaking. I like Exchange's approach - simple and reasonable. Gmail's use of random intervals seems overly complicated, although it also starts with short intervals and quickly increases to longer intervals.

I’ve been looking at the way Mercury handles retries when sending mail. The current procedure involves setting a “Basic minimum period between queue retries”, and using a “progressive backoff algorithm” to calculate job retries. Assuming you set the minimum period to N minutes, the algorithm will do ten tries, one every N minutes, then 10 tries, one every 1.5N minutes, and continue to increase the retry interval to 2N, 2.5N, 3N, 3.5N, 4N, 5N, 6N, every time doing ten retries at the current value. It then increases to 7N and uses that up to the maximum number of retries. This worked well in the days when you were retrying because of actual failures, but does not work quite as well when sending mail to a server that uses Grey listing. If you set your N too low, you end up hammering a receiving server that is actually down. If you set your N too high, your email to recipients that use Grey Listing gets delayed, and you also run the risk of getting it rejected: your first delivery attempt gets refused, and by the time you do your second attempt, the receiving server has already deleted your first attempt from its gray list and you are rejected again. Don’t laugh – this happened with one of our recipients, and we ended on their black list for repeated failed attempts. Even with a reasonable N, your email can get delayed quite a bit: we have a recipient that has two servers in their MX record. Our first try hits their server #1 and gets rejected by their Grey listing. Our second try hits their server #2 and gets rejected again (yes, I know their two servers should share data, but they don’t). The third try hits either server #1 or #2 and is accepted, for a total delay of 30 minutes (we use N=15) – all the while we are on the phone with them telling them their data will arrive “any minute now”. In an attempt to see how other servers handle this, I configured my Mercury server to reject all inbound e-mail containing a specific nonsense subject, and after sending myself email with that subject from a Gmail account and from a Microsoft Exchange account, I sat back and watched the Mercury logs. Exchange has the simplest algorithm. It retried after 1 minute, then waited another 2 minutes to try a second time, then waited 6 minutes for a third retry, and another 20 minutes for a fourth one. After that, it kept trying once every 60 minutes and gave up after a total of about 72 hours and 77 retries. Notice that they start very aggressively, and then back off rather quickly to a reasonable 1 hour between retries. Gmail sent the first retry 7 minutes after the initial rejection, the second after an additional 21 minutes, and the third one 27 minutes after that. It then started sending at what seem like random intervals selected with an ever increasing mean value. The times between tries are all over the place, but if you use exponential smoothing to look at them, you see a clearly increasing trend. The first couple of retry intervals were around one hour, while the final ones were in the 6 hour vicinity. It also gave up after 72 hours, but did only 27 retries in all. I have the data if anyone is interested. Obviously, I think that the algorithm that Mercury uses needs some tweaking. I like Exchange's approach - simple and reasonable. Gmail's use of random intervals seems overly complicated, although it also starts with short intervals and quickly increases to longer intervals.

dilberts_left_nut

posted Apr 27 '10 at 12:54 pm

Mine is set to 1 min w/ progressive backoff.

Seems to work fine. I've never seen blacklisting for retrying greylisted mail.

I also don't see any problem with retrying a 'down' server.

If people with greylisting on, complain to me about "where is the email" I point out that it is their mail system causing the delay and they should take it up with their provider.

Mine is set to 1 min w/ progressive backoff.Seems to work fine. I've never seen blacklisting for retrying greylisted mail.I also don't see any problem with retrying a 'down' server. If people with greylisting on, complain to me about "where is the email" I point out that it is their mail system causing the delay and they should take it up with their provider.

subelman

posted Apr 27 '10 at 6:27 pm

The problem with one minute with progressive backoff is that if the receiving server is actually down, your Mercury will retry a maximum of 99 times, and will give up trying to deliver in less than 7 hours. If this works for you, fine. But you have no way of having your Mercury retry for several days (e.g. over a weekend).

The problem with one minute with progressive backoff is that if the receiving server is actually down, your Mercury will retry a maximum of 99 times, and will give up trying to deliver in less than 7 hours. If this works for you, fine. But you have no way of having your Mercury retry for several days (e.g. over a weekend).&nbsp;

Thomas R. Stephenson

posted Apr 27 '10 at 6:58 pm

The default retry on failure when sending is 30 minutes (pretty much the out-of-the-box standard for an SMTP host).. A receiving system using grey walling should be set to retain the senders initial attempt for at least a few hours if not a few days. If you set this to use progressive delays and 99 retries the receiving system has to be down for a long time before it falls over to failure. If you set this to not use progressive delays you will still be trying every 30 minutes for some 50 hours and this normally is enough for any server.

The default retry on failure when sending is 30 minutes (pretty much the out-of-the-box standard for an SMTP host)..&nbsp; A receiving system using grey walling should be set to retain the senders initial attempt&nbsp; for at least a few hours if not a few days.&nbsp; If you set this to use progressive delays and 99 retries the receiving system has to be down for a long time before it falls over to failure.&nbsp; If you set this to not use progressive delays you will still be trying every 30 minutes for some 50 hours and this normally is enough for any server.

subelman

posted Apr 27 '10 at 8:15 pm

30 minutes is too long, for two reasons:

1) If the receiver has two servers in their MX record, your first try hits server #1 and mail is refused. Your second try hits server #2 and mail is refused again. You succeed on your third try, for a total delay of 1 hour. This is not acceptable when the customer is waiting on the phone and you are saying "I just emailed it to you". You cannot start explaining that it's THEIR server because they "never had this problem with any of your competitors" (sure, my competitors use Exchange, it retried at 1 and 2 minutes). The customer concludes we're incompetent, and goes to our competitors. I'm left with a lost sale and the satisfaction of knowing that my email server follows the standards and theirs doesn't. Is that worth losing a $100,000 project?

2) If the receiver greywalls you and you retry at 30 minutes and they have already deleted the first try from the database, you get rejected again (and possible blacklisted after a few retries). This puts you in an even worse position than in case 1 because they actually never get your email. You can tell them it's THEIR server that is misconfigured, but they "never had this problem with other vendors before". We are back at the lost sale and the satisfaction of knowing that, after all, my email server is configured correctly.

Bottom line: I'm convinced that Mercury needs a more realistic retry schedule: something like retry at 1 minute, at 2 minutes at 20 minutes and then slow down. If anything because there are a lot of improperly configured servers out there, and Mercury needs to compensate for them. The argument that the problem is with the receiving server, while absolutely true, does not work because the people receiving the email are never the ones that configured their server. All they see is that my Mercury is the only one that cannot send them email, and conclude I'm the incompetent one.

30 minutes is too long, for two reasons:1) If the receiver has two servers in their MX record, your first try hits server #1 and mail is refused. Your second try hits server #2 and mail is refused again. You succeed on your third try, for a total delay of 1 hour. This is not acceptable when the customer is waiting on the phone and you are saying "I just emailed it to you". You cannot start explaining that it's THEIR server because they "never had this problem with any of your competitors" (sure, my competitors use Exchange, it retried at 1 and 2 minutes). The customer concludes we're incompetent, and goes to our competitors. I'm left with a lost sale and the satisfaction of knowing that my email server follows the standards and theirs doesn't. Is that worth losing a $100,000 project? &nbsp;2) If the receiver greywalls you and you retry at 30 minutes and they have already deleted the first try from the database, you get rejected again (and possible blacklisted after a few retries). This puts you in an even worse position than in case 1 because they actually never get your email. You can tell them it's THEIR server that is misconfigured, but they "never had this problem with other vendors before". We are back at the lost sale and the satisfaction of knowing that, after all, my email server is configured correctly. Bottom line: I'm convinced that Mercury needs a more realistic retry schedule: something like retry at 1 minute, at 2 minutes at 20 minutes and then slow down. If anything because there are a lot of improperly configured servers out there, and Mercury needs to compensate for them. The argument that the problem is with the receiving server, while absolutely true, does not work because the people receiving the email are never the ones that configured their server. All they see is that my Mercury is the only one that cannot send them email, and conclude I'm the incompetent one.&nbsp;

Related Topics

Pending draft

Confirm move posts

Insufficient permissions

Select a different topic

Edit history