[rabbitmq-discuss] Restart Cluster after crash
Simon MacMullen
simon at rabbitmq.com
Mon Jun 10 15:15:37 BST 2013
On 10/06/13 06:32, Brendan Fry wrote:
> During earlier testing we were able to take down any and all of the
> nodes with a Windows restart and the cluster would recover. Though,
> after the unexpected crash that brought down the entire cluster the
> rabbit services will no longer start.
>
> We receive the following error:
>
> /C:\Program Files (x86)\RabbitMQ
> Server\rabbitmq_server-3.1.0\sbin>rabbitmq-server.bat/
> /
> /
> / RabbitMQ 3.1.0. Copyright (C) 2007-2013 VMware, Inc./
> / ## ## Licensed under the MPL. See http://www.rabbitmq.com//
> / ## ##/
> / ########## Logs: C:/RabbitMQ/log/rabbit at OTLABWEB02.log/
> / ###### ## C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log/
> / ##########/
> / Starting broker.../
> /
> /
> /BOOT FAILED/
> /===========/
> /Timeout contacting cluster nodes:
> [rabbit at OTLABWEB05,rabbit at OTLABWEB04,/
> /
> rabbit at OTLABWEB03,rabbit at OTLABWEB01,/
> /
> rabbit at OTLABAPP06,rabbit at OTLABAPP05]./
Hi. When starting a cluster from scratch, RabbitMQ will want the last
node stopped to be the first node started (since the last node stopped
may have seen changes that no other node saw).
So if your nodes were shut down correctly then you would just need to
make sure you start the last node first (after that you can start them
in any order). Starting any other node first will lead to an error
message similar to the one you posted.
However, if all nodes were shut down abruptly and simultaneously then
they can all decide that they were not the last one to shut down and
display this error. In that case, make sure you start all the nodes
simultaneously (well, within the 30 second timeout anyway).
Cheers, Simon
--
Simon MacMullen
RabbitMQ, Pivotal
More information about the rabbitmq-discuss
mailing list