[rabbitmq-discuss] Errors starting RabbitMQ when cluster membership changes

Matt Pietrek mpietrek at skytap.com
Mon Oct 1 18:07:11 BST 2012

Hey Tim,

I was expecting that basic response. Thanks!

However... I still think there may an issue here that hinders RabbitMQ's
deployment in production scenarios. Please correct me if I'm wrong or
missing something.

Hypothetically, what would happen if my 'util' node was zapped by lightning
and I had no way to bring it up in a timely manner. Would I be able to
start the existing cluster nodes (play, play2) "far enough" to run the
proper rabbitmqctl command to remove util from the cluster?

That is, once I've gotten into a bad situation, can I back out of it
gracefully and without message loss? Or is the only option to reset the
entire cluster?


On Mon, Oct 1, 2012 at 3:04 AM, Tim Watson <tim at rabbitmq.com> wrote:

>  Hi Matt,
> On 09/28/2012 08:23 PM, Matt Pietrek wrote:
> For example, at one point I had a three node cluster: play, play2, and
> util. I then removed util from the cluster, although to be honest, simply
> by changing the rabbitmq.config file, rather than explicitly running
> rabbitmqctl stop_app while the cluster is still running.
> I'm pretty sure you're not supposed to do that! :)
> My steps:
>    - Running as three node cluster, stop all brokers
>    - Create a new rabbitmq.config file with just two brokers
>    - Attempt to start the new cluster.
> If you don't take util offline using the right procedure, I suspect mnesia
> will get out of sorts and this isn't something you want to happen. It's
> important to make cluster changes using the right procedure, as mnesia is
> rather a fussy beast.
> BTW we've made some improvements that (hopefully) simplify working with
> clusters and these will be in the forthcoming feature release!
> Cheers,
> Tim
