[rabbitmq-discuss] Cluster fails when current primary node shuts down.
simon at rabbitmq.com
Mon Mar 19 10:44:44 GMT 2012
On 15/03/12 19:57, Travis wrote:
> When this completed on machineA, instead of the cluster failing over,
> the rabbitmq on machineB died. What we then noticed was that when we
> tried to start up machineB's rabbitmq-server, it would fail the
> startup process. machineB would only ever start up if machineA's
> rabbitmq-server was started first.
> note: this is ONLY happening on our production cluster; I can't seem
> to reproduce it in our QA environment. I suspect something is whacked
> in cluster config in production.
OK, that's alarming.
> Anyone else seen this? Is service rabbitmq-server stop sufficient to
> cause a safe failover? Or is there a more preferred way?
There was a racy bug in a previous release that did this. That should
really be fixed though. "service rabbitmq-server stop" should be fine.
> Unfortunately, I don't have the log messages from machineB's
> rabbitmq-server because it appears that they get overwritten upon
> subsequent restarts of rabbitmq.:-(
Unfortunately it's not easy for us to debug problems without them...
They really should get appended to rather than overwritten though (well
apart from the stdout/err logs) - are you sure you don't have them?
More information about the rabbitmq-discuss