[rabbitmq-discuss] autoheal behavior in the presence of HA-queues mastered on multiple nodes?
Simon MacMullen
simon at rabbitmq.com
Mon Feb 24 10:57:54 GMT 2014
On 21/02/2014 11:49PM, Matt Pietrek wrote:
> Picture a two node cluster with nodes A and B, using HA-queues and
> autoheal. Some queues are mastered on A, and others on B. There's a VIP
> in front of the cluster that points to one of the brokers.
OK. Looking at how things evolve step by step:
> Now imagine a network event occurs and the cluster splits.
At this point the network is partitioned: both side of the cluster are
running separately, each believing that the other has gone down. So on
A, all the queues that had masters on B fail over to A, and vice versa on B.
So this is the nub of why network partitions are a big deal; the two
sides of the cluster both think they are authoritative and both start to
evolve separately.
> Autoheal kicks in,
Of course autoheal will not kick in until the underlying network
partition is resolved; until the two sides can see each other they will
not have a clue anything has gone wrong (well, more wrong than node
failure).
> selects a loser (say, B), and restarts it to rejoin the
> cluster. What happens to the data in queues that were mastered on B when
> B restarts?
Hopefully at this point you have your answer: since A is the winning
side, the queue state from A overwrites the queue state from B. And the
queue state from A will be whatever the state was at the time of the
split (even for queues mastered on B), with whatever changes A has seen
since.
Now, since you said earlier "there is a VIP pointing at A" then maybe no
changes have been taking place on B anyway. But if there were any
changes on B, you lost them.
Partitions are a big deal, and autoheal is not a panacea.
> And does it matter if the messages in the queue were
> persistent or not?
No.
Cheers, Simon
--
Simon MacMullen
RabbitMQ, Pivotal
More information about the rabbitmq-discuss
mailing list