I am curious as to what the behavior of HA queues during a network split is. <div><br></div><div>The documentation states that when a mater fails a slave will be promoted to master, but its silent under what conditions a slave will consider a master to have failed. Is there some timeout after which slaves will consider a master to have failed? If so, what is the time value?</div>
<div><br></div><div>Assuming that such timeout exists, if there is a network split you may end up with two clusters, each one which now has a master. Each may also have publisher and consumers that continue to work happily against the split cluster.</div>
<div><br></div><div>What happens when the network split is repaired? Will the clusters join? If so, what will happen to the HA queue? Will one of the existing master be demoted to slave? If so, what happens to its queue of messages that originated within its split cluster? Are they lost?</div>
<div><br></div><div>I suppose a lot of this depends on the underlaying Mnesia DB. I realize RMQ is CA system out the CAP theorem, but its not at all clear what occurs in the face of a network partition.</div><div><br></div>
<div>Elias Levy</div>