[rabbitmq-discuss] Cluster fails when current primary node shuts down.

Simon MacMullen simon at rabbitmq.com
Mon Mar 19 10:44:44 GMT 2012


On 15/03/12 19:57, Travis wrote:
> When this completed on machineA, instead of the cluster failing over,
> the rabbitmq on machineB died.  What we then noticed was that when we
> tried to start up machineB's rabbitmq-server, it would fail the
> startup process.  machineB would only ever start up if machineA's
> rabbitmq-server was started first.
>
> note:  this is ONLY happening on our production cluster; I can't seem
> to reproduce it in our QA environment.  I suspect something is whacked
> in cluster config in production.

OK, that's alarming.

> Anyone else seen this?  Is service rabbitmq-server stop sufficient to
> cause a safe failover?  Or is there a more preferred way?

There was a racy bug in a previous release that did this. That should 
really be fixed though. "service rabbitmq-server stop" should be fine.

> Unfortunately, I don't have the log messages from machineB's
> rabbitmq-server because it appears that they get overwritten upon
> subsequent restarts of rabbitmq.:-(

Unfortunately it's not easy for us to debug problems without them...

They really should get appended to rather than overwritten though (well 
apart from the stdout/err logs) - are you sure you don't have them?

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, VMware


More information about the rabbitmq-discuss mailing list