<div dir="ltr"><div>Thanks Simon. A bit after I sent my messages I came to the same conclusion. It makes me very happy that you confirmed my reasoning.</div><div><br></div><div>Matt</div><div><br></div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">On Mon, Feb 24, 2014 at 2:57 AM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On 21/02/2014 11:49PM, Matt Pietrek wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
Picture a two node cluster with nodes A and B, using HA-queues and<br>
autoheal. Some queues are mastered on A, and others on B. There's a VIP<br>
in front of the cluster that points to one of the brokers.<br>
</blockquote>
<br></div>
OK. Looking at how things evolve step by step:<div><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
Now imagine a network event occurs and the cluster splits.<br>
</blockquote>
<br></div>
At this point the network is partitioned: both side of the cluster are running separately, each believing that the other has gone down. So on A, all the queues that had masters on B fail over to A, and vice versa on B.<br>
<br>
So this is the nub of why network partitions are a big deal; the two sides of the cluster both think they are authoritative and both start to evolve separately.<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
Autoheal kicks in,<br>
</blockquote>
<br>
Of course autoheal will not kick in until the underlying network partition is resolved; until the two sides can see each other they will not have a clue anything has gone wrong (well, more wrong than node failure).<div><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
selects a loser (say, B), and restarts it to rejoin the<br>
cluster. What happens to the data in queues that were mastered on B when<br>
B restarts?<br>
</blockquote>
<br></div>
Hopefully at this point you have your answer: since A is the winning side, the queue state from A overwrites the queue state from B. And the queue state from A will be whatever the state was at the time of the split (even for queues mastered on B), with whatever changes A has seen since.<br>
<br>
Now, since you said earlier "there is a VIP pointing at A" then maybe no changes have been taking place on B anyway. But if there were any changes on B, you lost them.<br>
<br>
Partitions are a big deal, and autoheal is not a panacea.<div><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
And does it matter if the messages in the queue were<br>
persistent or not?<br>
</blockquote>
<br></div>
No.<br>
<br>
Cheers, Simon<span class="HOEnZb"><font color="#888888"><br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, Pivotal<br>
</font></span></blockquote></div><br></div>