[rabbitmq-discuss] AWS clustering
matthias at rabbitmq.com
Wed Sep 5 10:01:47 BST 2012
On 05/09/12 09:46, Francesco Mazzoli wrote:
> At Tue, 4 Sep 2012 20:01:38 -0700 (PDT),
> Glade wrote:
>> For a supposedly "just works" kind of service, that is just not good enough. I
>> can't have my ops people rolling out of bed to take action every time there's
>> a minor network glitch.
> Rabbit clustering is meant to be run on local networks and is not tolerant to
> "network glitches". If you expect those, then don't use it.
There is in fact some built-in tolerance to glitches. Specifically, TCP
should be tolerant to network glitches if appropriately configured (or,
rather, unless inappropriately misconfigured).
You may also want to increase Erlang's kernel net_ticktime. See
http://www.erlang.org/doc/man/kernel_app.html. And make sure you are
running the most recent Erlang release.
If, however, the glitch is severe enough to exceed those tolerances
then you'll end up with a proper network split, which, as Francesco
notes, Erlang's mnesia distributed db (on which rabbit's cluster is
based) cannot cope with, thus requiring manual intervention to recover.
More information about the rabbitmq-discuss