[rabbitmq-discuss] HA behavior during a network split
Tim Watson
watsont at vmware.com
Wed Jul 11 10:24:54 BST 2012
Hi Elias,
As I'm fairly new around here, I'll try and share what I've learned so
far and allow the more experienced folks to chip in and fill in the
details (or correct me if I go astray).
On 07/11/2012 12:11 AM, Elias Levy wrote:
> I am curious as to what the behavior of HA queues during a network
> split is.
>
> The documentation states that when a mater fails a slave will be
> promoted to master, but its silent under what conditions a slave will
> consider a master to have failed. Is there some timeout after which
> slaves will consider a master to have failed? If so, what is the time
> value?
>
This situation is not handled using a timeout. HA queues are based on a
technology called Guaranteed Multicast (aka GM), which was developed
independently by and for RabbitMQ. This provides an atomic broadcast
capability which is similar to the work described by Levy et al
(biblion.epfl.ch/EPFL/theses/2008/3999/EPFL_TH3999.pdf) though as I
mentioned earlier (and as per the documentation), was developed
independently.
You can take a look at the GM source code here:
http://hg.rabbitmq.com/rabbitmq-server/file/default/src/gm.erl
A GM group forms a ring, in which members are connected to their
immediate neighbours (in both directions) only. If this connection
breaks then the death of the member is propagated around the ring and
everything 'reshuffles' to compensate for this. The deaths are noticed
because the Erlang processes involved are monitored (see the links under
[monitors] at the bottom for technical details) and the guarantees and
relative timings involved can be understood in that context.
In actual fact, mirror (i.e., HA) queues are implemented 'on top of' GM
and also rely on Rabbit's clustering infrastructure, so additional
(Erlang) process and node monitoring is in place at the level above GM
which will also *notice* if a node goes down.
> Assuming that such timeout exists, if there is a network split you may
> end up with two clusters, each one which now has a master. Each may
> also have publisher and consumers that continue to work happily
> against the split cluster.
>
Now we're talking about two different things. Rabbit clustering is
independent of mirror (HA) queues, though the two things are
interdependent. If a netsplit occurs then the surviving nodes which are
still connected to the extant master *should* continue happily on. What
will happen to the nodes in the other 'half' of the split, I'm not so
sure and will put my hand up and ask someone better versed in this to
fill in the blanks.
> What happens when the network split is repaired? Will the clusters
> join? If so, what will happen to the HA queue? Will one of the
> existing master be demoted to slave? If so, what happens to its queue
> of messages that originated within its split cluster? Are they lost?
>
AFAIK it is possible for MNesia to heal itself after a netsplit, and
therefore getting nodes to rejoin a cluster might work without
intervention, possibly depending on what has happened independently on
the two 'halves' of the split in the intervening time period. What I
would not expect to happen (though I could be wrong here!) is for two
distinct GM rings to join up and become one, promoting a new master or
demoting an existing one, the latter behaviour being undefined (i.e.,
not implemented) AFAICT.
When a node rejoins a cluster, mnesia needs to reconcile the differences
and I would expect to see mnesia fail when trying to rejoin the cluster
if the (Erlang) process ID for the master was different between the two
nodes.
> I suppose a lot of this depends on the underlaying Mnesia DB.
> I realize RMQ is CA system out the CAP theorem, but its not at all
> clear what occurs in the face of a network partition.
>
Yes indeed - mnesia does not play nicely in this kind of scenario. There
are some efforts underway to make it *easier* to deal with netsplits
(for example
https://github.com/uwiger/otp/commit/3f70f3def4e33828da4237b07cbee9f73121c661
and https://github.com/uwiger/unsplit) but these are not mainstream or
ready to production use just yet.
And even if some mechanism were available, we would have the dual
problems of deciding on which mnesia record is the correct (system of
record) *and* being able to join 2 GM rings back together, which sounds
infeasibly hard to me.
[monitors]
http://www.erlang.org/doc/reference_manual/processes.html#id82613
http://www.erlang.org/doc/man/erlang.html#monitor-2
http://www.erlang.org/doc/man/net_kernel.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120711/c4b5a8a8/attachment.htm>
More information about the rabbitmq-discuss
mailing list