[rabbitmq-discuss] HA behavior during a network split

Wed Jul 11 10:42:59 BST 2012

On 07/11/2012 10:24 AM, Tim Watson wrote:
>
> AFAIK it is possible for MNesia to heal itself after a netsplit, and 
> therefore getting nodes to rejoin a cluster might work without 
> intervention, possibly depending on what has happened independently on 
> the two 'halves' of the split in the intervening time period. What I 
> would not expect to happen (though I could be wrong here!) is for two 
> distinct GM rings to join up and become one, promoting a new master or 
> demoting an existing one, the latter behaviour being undefined (i.e., 
> not implemented) AFAICT.
>
> When a node rejoins a cluster, mnesia needs to reconcile the 
> differences and I would expect to see mnesia fail when trying to 
> rejoin the cluster if the (Erlang) process ID for the master was 
> different between the two nodes.
>

And I should probably have pointed out that the message store is 
independent from mnesia as well, and I'm pretty certain that if these 
got out of synch somehow then you'd be in trouble. Currently what 
happens when a node joins a cluster and needs to become a slave (in the 
HA sense) is that the mirror queue coordinator ensures that messages are 
forwarded to that node until its message queue length is the same as 
that of the master, at which point it is considered 'in-sync' with the 
master.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120711/5c6d5d0e/attachment.htm>