<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Elias,<br>
<br>
As I'm fairly new around here, I'll try and share what I've learned
so far and allow the more experienced folks to chip in and fill in
the details (or correct me if I go astray).<br>
<br>
On 07/11/2012 12:11 AM, Elias Levy wrote:
<blockquote
cite="mid:CAFDmHdMA-7QCJLf=R3TjSorKf3R1EwnrEJvyj9RPsLPagCw4-w@mail.gmail.com"
type="cite">I am curious as to what the behavior of HA queues
during a network split is.
<div><br>
</div>
<div>The documentation states that when a mater fails a slave will
be promoted to master, but its silent under what conditions a
slave will consider a master to have failed. Is there some
timeout after which slaves will consider a master to have
failed? If so, what is the time value?</div>
<div><br>
</div>
</blockquote>
<br>
This situation is not handled using a timeout. HA queues are based
on a technology called Guaranteed Multicast (aka GM), which was
developed independently by and for RabbitMQ. This provides an atomic
broadcast capability which is similar to the work described by Levy
et al (<cite>biblion.epfl.ch/EPFL/theses/2008/3999/EPFL_TH3999.pdf)
</cite>though as I mentioned earlier (and as per the documentation),
was developed independently.<br>
<br>
You can take a look at the GM source code here:
<a class="moz-txt-link-freetext" href="http://hg.rabbitmq.com/rabbitmq-server/file/default/src/gm.erl">http://hg.rabbitmq.com/rabbitmq-server/file/default/src/gm.erl</a><br>
<br>
A GM group forms a ring, in which members are connected to their
immediate neighbours (in both directions) only. If this connection
breaks then the death of the member is propagated around the ring
and everything 'reshuffles' to compensate for this. The deaths are
noticed because the Erlang processes involved are monitored (see the
links under [monitors] at the bottom for technical details) and the
guarantees and relative timings involved can be understood in that
context.<br>
<br>
In actual fact, mirror (i.e., HA) queues are implemented 'on top of'
GM and also rely on Rabbit's clustering infrastructure, so
additional (Erlang) process and node monitoring is in place at the
level above GM which will also *notice* if a node goes down.<br>
<br>
<blockquote
cite="mid:CAFDmHdMA-7QCJLf=R3TjSorKf3R1EwnrEJvyj9RPsLPagCw4-w@mail.gmail.com"
type="cite">
<div>Assuming that such timeout exists, if there is a network
split you may end up with two clusters, each one which now has a
master. Each may also have publisher and consumers that
continue to work happily against the split cluster.</div>
<div><br>
</div>
</blockquote>
<br>
Now we're talking about two different things. Rabbit clustering is
independent of mirror (HA) queues, though the two things are
interdependent. If a netsplit occurs then the surviving nodes which
are still connected to the extant master *should* continue happily
on. What will happen to the nodes in the other 'half' of the split,
I'm not so sure and will put my hand up and ask someone better
versed in this to fill in the blanks.<br>
<br>
<blockquote
cite="mid:CAFDmHdMA-7QCJLf=R3TjSorKf3R1EwnrEJvyj9RPsLPagCw4-w@mail.gmail.com"
type="cite">
<div>What happens when the network split is repaired? Will the
clusters join? If so, what will happen to the HA queue? Will
one of the existing master be demoted to slave? If so, what
happens to its queue of messages that originated within its
split cluster? Are they lost?</div>
<div><br>
</div>
</blockquote>
<br>
AFAIK it is possible for MNesia to heal itself after a netsplit, and
therefore getting nodes to rejoin a cluster might work without
intervention, possibly depending on what has happened independently
on the two 'halves' of the split in the intervening time period.
What I would not expect to happen (though I could be wrong here!) is
for two distinct GM rings to join up and become one, promoting a new
master or demoting an existing one, the latter behaviour being
undefined (i.e., not implemented) AFAICT.<br>
<br>
When a node rejoins a cluster, mnesia needs to reconcile the
differences and I would expect to see mnesia fail when trying to
rejoin the cluster if the (Erlang) process ID for the master was
different between the two nodes.<br>
<br>
<blockquote
cite="mid:CAFDmHdMA-7QCJLf=R3TjSorKf3R1EwnrEJvyj9RPsLPagCw4-w@mail.gmail.com"
type="cite">
<div>I suppose a lot of this depends on the underlaying Mnesia DB.
I realize RMQ is CA system out the CAP theorem, but its not at
all clear what occurs in the face of a network partition.</div>
<br>
</blockquote>
<br>
Yes indeed - mnesia does not play nicely in this kind of scenario.
There are some efforts underway to make it *easier* to deal with
netsplits (for example
<a class="moz-txt-link-freetext" href="https://github.com/uwiger/otp/commit/3f70f3def4e33828da4237b07cbee9f73121c661">https://github.com/uwiger/otp/commit/3f70f3def4e33828da4237b07cbee9f73121c661</a>
and <a class="moz-txt-link-freetext" href="https://github.com/uwiger/unsplit">https://github.com/uwiger/unsplit</a>) but these are not mainstream
or ready to production use just yet.<br>
<br>
And even if some mechanism were available, we would have the dual
problems of deciding on which mnesia record is the correct (system
of record) *and* being able to join 2 GM rings back together, which
sounds infeasibly hard to me.<br>
<br>
[monitors]<br>
<a class="moz-txt-link-freetext" href="http://www.erlang.org/doc/reference_manual/processes.html#id82613">http://www.erlang.org/doc/reference_manual/processes.html#id82613</a><br>
<a class="moz-txt-link-freetext" href="http://www.erlang.org/doc/man/erlang.html#monitor-2">http://www.erlang.org/doc/man/erlang.html#monitor-2</a><br>
<a class="moz-txt-link-freetext" href="http://www.erlang.org/doc/man/net_kernel.html">http://www.erlang.org/doc/man/net_kernel.html</a><br>
<br>
</body>
</html>