[rabbitmq-discuss] Restart Cluster after crash

Mon Jun 10 06:32:39 BST 2013

Hi,

We are trying to restart our RabbitMQ cluster after an unexpected 
environment failure. 

We are running:

   - rabbitmq_server-3.1.0 on Windows
   - erl5.10.1

Our cluster is configured like so:

   - web01, web02, web03, web04, web05, app05, app06

During earlier testing we were able to take down any and all of the nodes 
with a Windows restart and the cluster would recover. Though, after the 
unexpected crash that brought down the entire cluster the rabbit services 
will no longer start.

We receive the following error:

*C:\Program Files (x86)\RabbitMQ 
Server\rabbitmq_server-3.1.0\sbin>rabbitmq-server.bat*
*
*
*              RabbitMQ 3.1.0. Copyright (C) 2007-2013 VMware, Inc.*
*  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/*
*  ##  ##*
*  ##########  Logs: C:/RabbitMQ/log/rabbit at OTLABWEB02.log*
*  ######  ##        C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log*
*  ##########*
*              Starting broker...*
*
*
*BOOT FAILED*
*===========*
*Timeout contacting cluster nodes: [rabbit at OTLABWEB05,rabbit at OTLABWEB04,*
*                                   rabbit at OTLABWEB03,rabbit at OTLABWEB01,*
*                                   rabbit at OTLABAPP06,rabbit at OTLABAPP05].*
*
*
*DIAGNOSTICS*
*===========*
*nodes in question: [rabbit at OTLABWEB05,rabbit at OTLABWEB04,rabbit at OTLABWEB03,*
*                    rabbit at OTLABWEB01,rabbit at OTLABAPP06,rabbit at OTLABAPP05]*
*
*
*hosts, their running nodes and ports:*
*- OTLABAPP05: []*
*- OTLABAPP06: []*
*- OTLABWEB01: []*
*- OTLABWEB03: []*
*- OTLABWEB04: []*
*- OTLABWEB05: []*
*
*
*current node details:*
*- node name: rabbit at OTLABWEB02*
*- home dir: U:\*
*- cookie hash: j9x9r680xF6JzFI7IVDLew==*
*
*
*BOOT FAILED*
*===========*
*Error description:*
*   {could_not_start,rabbit,*
*       {bad_return,*
*           {{rabbit,start,[normal,[]]},*
*            {'EXIT',*
*                {rabbit,failure_during_boot,*
*                    {error,*
*                        {timeout_waiting_for_tables,*
*                            
[rabbit_user,rabbit_user_permission,rabbit_vhost,*
*                             rabbit_durable_route,rabbit_durable_exchange,*
*                             rabbit_runtime_parameters,*
*                             rabbit_durable_queue]}}}}}}}*
*
*
*Log files (may contain more information):*
*   C:/RabbitMQ/log/rabbit at OTLABWEB02.log*
*   C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log*
*
*
*{"init terminating in 
do_boot",{rabbit,failure_during_boot,{could_not_start,rabb*
*
it,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot,{
*
*
error,{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_vho
*
*
st,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit
*
*_durable_queue]}}}}}}}}}*
*
*
*Crash dump was written to: erl_crash.dump*
*init terminating in do_boot ()*

I have attached the log files from web02.

Reading the groups and Googling we have managed to recreate the cluster 
before, but at the loss of the queues. We would like to retain our queues 
and the information they contained. We hope that this is easy to solve, 
since servers do unexpectedly go down. :(

Any help would be greatly appreciated.

Thanks
Brendan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at OTLABWEB02.log
Type: application/octet-stream
Size: 1994 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at OTLABWEB02-sasl.log
Type: application/octet-stream
Size: 965 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment-0001.obj>