[rabbitmq-discuss] HA behavior during a network split
Tim Watson
tim at rabbitmq.com
Wed Jul 11 10:42:59 BST 2012
On 07/11/2012 10:24 AM, Tim Watson wrote:
>
> AFAIK it is possible for MNesia to heal itself after a netsplit, and
> therefore getting nodes to rejoin a cluster might work without
> intervention, possibly depending on what has happened independently on
> the two 'halves' of the split in the intervening time period. What I
> would not expect to happen (though I could be wrong here!) is for two
> distinct GM rings to join up and become one, promoting a new master or
> demoting an existing one, the latter behaviour being undefined (i.e.,
> not implemented) AFAICT.
>
> When a node rejoins a cluster, mnesia needs to reconcile the
> differences and I would expect to see mnesia fail when trying to
> rejoin the cluster if the (Erlang) process ID for the master was
> different between the two nodes.
>
And I should probably have pointed out that the message store is
independent from mnesia as well, and I'm pretty certain that if these
got out of synch somehow then you'd be in trouble. Currently what
happens when a node joins a cluster and needs to become a slave (in the
HA sense) is that the mirror queue coordinator ensures that messages are
forwarded to that node until its message queue length is the same as
that of the master, at which point it is considered 'in-sync' with the
master.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120711/5c6d5d0e/attachment.htm>
More information about the rabbitmq-discuss
mailing list