[rabbitmq-discuss] Restart Cluster after crash

Simon MacMullen simon at rabbitmq.com
Mon Jun 10 15:15:37 BST 2013


On 10/06/13 06:32, Brendan Fry wrote:
> During earlier testing we were able to take down any and all of the
> nodes with a Windows restart and the cluster would recover. Though,
> after the unexpected crash that brought down the entire cluster the
> rabbit services will no longer start.
>
> We receive the following error:
>
>     /C:\Program Files (x86)\RabbitMQ
>     Server\rabbitmq_server-3.1.0\sbin>rabbitmq-server.bat/
>     /
>     /
>     /              RabbitMQ 3.1.0. Copyright (C) 2007-2013 VMware, Inc./
>     /  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com//
>     /  ##  ##/
>     /  ##########  Logs: C:/RabbitMQ/log/rabbit at OTLABWEB02.log/
>     /  ######  ##        C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log/
>     /  ##########/
>     /              Starting broker.../
>     /
>     /
>     /BOOT FAILED/
>     /===========/
>     /Timeout contacting cluster nodes:
>     [rabbit at OTLABWEB05,rabbit at OTLABWEB04,/
>     /
>       rabbit at OTLABWEB03,rabbit at OTLABWEB01,/
>     /
>       rabbit at OTLABAPP06,rabbit at OTLABAPP05]./

Hi. When starting a cluster from scratch, RabbitMQ will want the last 
node stopped to be the first node started (since the last node stopped 
may have seen changes that no other node saw).

So if your nodes were shut down correctly then you would just need to 
make sure you start the last node first (after that you can start them 
in any order). Starting any other node first will lead to an error 
message similar to the one you posted.

However, if all nodes were shut down abruptly and simultaneously then 
they can all decide that they were not the last one to shut down and 
display this error. In that case, make sure you start all the nodes 
simultaneously (well, within the 30 second timeout anyway).

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, Pivotal


More information about the rabbitmq-discuss mailing list