[rabbitmq-discuss] How robust is clustering, and under what conditions?

Eugene Kirpichov ekirpichov at gmail.com
Thu Nov 15 12:04:53 GMT 2012


Hi,

Is RabbitMQ HA and clustering sufficiently reliable to use it in scenarios
where the network is good, but nodes can reboot at any time?

My understanding was that this is what "HA" is supposed to mean, but then I
read this:

http://stackoverflow.com/questions/8654053/rabbitmq-cluster-is-not-reconnecting-after-network-failure
http://rabbitmq.1065348.n5.nabble.com/Cluster-nodes-stop-start-order-can-lead-to-failures-td21965.html
http://rabbitmq.1065348.n5.nabble.com/Cluster-busting-shut-off-all-nodes-at-the-same-time-td22971.html
:
http://rabbitmq.1065348.n5.nabble.com/Repairing-a-a-crashed-cluster-td22466.html
http://grokbase.com/t/rabbitmq/rabbitmq-discuss/125nxzf5nh/highly-available-cluster

And now I'm not so sure. It seems that there are a lot of scenarios where
merely rebooting the nodes in some order brings the cluster into a state
from which there is no automatic way out.

Questions:
1) Is there a set of assumptions or procedures under which I can be
*certain* that my RabbitMQ cluster will actually tolerate unexpected node
failures? Maybe something like "no more than 1 node down at the same time",
or "at least X seconds between reboots", or "after a node reboots, restart
all rabbit instances" or "have at most 2 nodes" etc.? I'm asking because I
need to at least document this to my customers.
2) To what degree are the issues described in those threads fixed in the
next release of RabbitMQ - 3.0.0, and how soon is it expected to be
production-ready?

-- 
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov
We're hiring! http://tinyurl.com/mirantis-openstack-engineer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121115/cecbef00/attachment.htm>


More information about the rabbitmq-discuss mailing list