<div dir="ltr"><div>Thanks Simon. A bit after I sent my messages I came to the same conclusion. It makes me very happy that you confirmed my reasoning.</div><div><br></div><div>Matt</div><div><br></div></div><div class="gmail_extra">

<br><br><div class="gmail_quote">On Mon, Feb 24, 2014 at 2:57 AM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>On 21/02/2014 11:49PM, Matt Pietrek wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

Picture a two node cluster with nodes A and B, using HA-queues and<br>

autoheal. Some queues are mastered on A, and others on B. There's a VIP<br>

in front of the cluster that points to one of the brokers.<br>

</blockquote>

<br></div>

OK. Looking at how things evolve step by step:<div><br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

Now imagine a network event occurs and the cluster splits.<br>

</blockquote>

<br></div>

At this point the network is partitioned: both side of the cluster are running separately, each believing that the other has gone down. So on A, all the queues that had masters on B fail over to A, and vice versa on B.<br>


<br>

So this is the nub of why network partitions are a big deal; the two sides of the cluster both think they are authoritative and both start to evolve separately.<br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

Autoheal kicks in,<br>

</blockquote>

<br>

Of course autoheal will not kick in until the underlying network partition is resolved; until the two sides can see each other they will not have a clue anything has gone wrong (well, more wrong than node failure).<div><br>


<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

selects a loser (say, B), and restarts it to rejoin the<br>

cluster. What happens to the data in queues that were mastered on B when<br>

B restarts?<br>

</blockquote>

<br></div>

Hopefully at this point you have your answer: since A is the winning side, the queue state from A overwrites the queue state from B. And the queue state from A will be whatever the state was at the time of the split (even for queues mastered on B), with whatever changes A has seen since.<br>


<br>

Now, since you said earlier "there is a VIP pointing at A" then maybe no changes have been taking place on B anyway. But if there were any changes on B, you lost them.<br>

<br>

Partitions are a big deal, and autoheal is not a panacea.<div><br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

And does it matter if the messages in the queue were<br>

persistent or not?<br>

</blockquote>

<br></div>

No.<br>

<br>

Cheers, Simon<span class="HOEnZb"><font color="#888888"><br>

<br>

-- <br>

Simon MacMullen<br>

RabbitMQ, Pivotal<br>

</font></span></blockquote></div><br></div>