Hi,<div><br></div><div>Is RabbitMQ HA and clustering sufficiently reliable to use it in scenarios where the network is good, but nodes can reboot at any time?</div><div><br></div><div>My understanding was that this is what "HA" is supposed to mean, but then I read this:</div>
<div><br></div><div><a href="http://stackoverflow.com/questions/8654053/rabbitmq-cluster-is-not-reconnecting-after-network-failure">http://stackoverflow.com/questions/8654053/rabbitmq-cluster-is-not-reconnecting-after-network-failure</a></div>
<div><a href="http://rabbitmq.1065348.n5.nabble.com/Cluster-nodes-stop-start-order-can-lead-to-failures-td21965.html">http://rabbitmq.1065348.n5.nabble.com/Cluster-nodes-stop-start-order-can-lead-to-failures-td21965.html</a></div>
<div><div><a href="http://rabbitmq.1065348.n5.nabble.com/Cluster-busting-shut-off-all-nodes-at-the-same-time-td22971.html">http://rabbitmq.1065348.n5.nabble.com/Cluster-busting-shut-off-all-nodes-at-the-same-time-td22971.html</a>:</div>
<div><a href="http://rabbitmq.1065348.n5.nabble.com/Repairing-a-a-crashed-cluster-td22466.html">http://rabbitmq.1065348.n5.nabble.com/Repairing-a-a-crashed-cluster-td22466.html</a><br></div><div><a href="http://grokbase.com/t/rabbitmq/rabbitmq-discuss/125nxzf5nh/highly-available-cluster">http://grokbase.com/t/rabbitmq/rabbitmq-discuss/125nxzf5nh/highly-available-cluster</a><br>
</div><div><br></div><div>And now I'm not so sure. It seems that there are a lot of scenarios where merely rebooting the nodes in some order brings the cluster into a state from which there is no automatic way out.</div>
<div><br></div><div>Questions:</div><div>1) Is there a set of assumptions or procedures under which I can be *certain* that my RabbitMQ cluster will actually tolerate unexpected node failures? Maybe something like "no more than 1 node down at the same time", or "at least X seconds between reboots", or "after a node reboots, restart all rabbit instances" or "have at most 2 nodes" etc.? I'm asking because I need to at least document this to my customers.</div>
<div>2) To what degree are the issues described in those threads fixed in the next release of RabbitMQ - 3.0.0, and how soon is it expected to be production-ready?</div><div><br></div>-- <br>Eugene Kirpichov<br><a href="http://www.linkedin.com/in/eugenekirpichov" target="_blank">http://www.linkedin.com/in/eugenekirpichov</a><br>
We're hiring! <a href="http://tinyurl.com/mirantis-openstack-engineer" target="_blank">http://tinyurl.com/mirantis-openstack-engineer</a><br>
</div>