[rabbitmq-discuss] rabbitmq cluster manage/ops questions
torinsandall at gmail.com
Wed Aug 15 20:33:08 BST 2012
I'm working on a project involving clustered rabbitmq brokers and I
would like to gain a better understanding of the operational
constraints. I've read the clustering article on the site, but I feel
like I don't have a solid understanding of it yet.
1) What constraints need to be observed to guarantee the cluster state
remains consistent. I.e., the cluster will not fall into a "split
2) Is there anything invalid or problematic about the tests I describe below?
I've been running tests involving 2-3 clustered brokers. All brokers
are running 2.7.1. The cookies are synched properly.
The cluster can be in one of the following states:
2 disc, 1 ram
In any of the states, my test can kill one of the brokers (ram or
disc, it doesn't discriminate.) If a broker is killed the next event
the test would execute is either a restart of the dead broker or a
replacement of the dead broker. Replacement is done by deleting the
mnesia database on that node and then service start, stop_app, reset,
In the 2 broker-cluster states, the cluster can be grown to size of 3.
In the 3 broker-cluster states, the cluster can be shrunk to size of 2
with a constraint that it won't ever shrink to 1 disc/1 ram.
Growing and shrinking of the cluster is always done by running
rabbitmqctl commands. I.e., there's no cluster configuration in the
rabbitmq.config file. For those who will ask, the commands I'm running
to grow and shrink the cluster are:
1) grow cluster by adding a ram node
rabbitmqctl cluster <existing-disc-node>
2) grow cluster by adding a disc node
rabbitmqctl cluster <existing-disc-node> <node-to-be-added>
3) shrink cluster by removing a node
I'm not inserting any delays between execution of events (other than
the implicit delay of having to ssh into the server and execute the
One issue I've encountered so far:
1) rabbit fails to start after shrinking 2 disc/1 ram cluster to 2
disc cluster and then killing a disc node. Here's teh log from the
disc node which fails to start. There's also output from my test
script at the bottom which shows the cluster status:
More information about the rabbitmq-discuss