[rabbitmq-discuss] Restart Cluster after crash
Brendan Fry
bfry at fryhard.com
Mon Jun 10 06:32:39 BST 2013
Hi,
We are trying to restart our RabbitMQ cluster after an unexpected
environment failure.
We are running:
- rabbitmq_server-3.1.0 on Windows
- erl5.10.1
Our cluster is configured like so:
- web01, web02, web03, web04, web05, app05, app06
During earlier testing we were able to take down any and all of the nodes
with a Windows restart and the cluster would recover. Though, after the
unexpected crash that brought down the entire cluster the rabbit services
will no longer start.
We receive the following error:
*C:\Program Files (x86)\RabbitMQ
Server\rabbitmq_server-3.1.0\sbin>rabbitmq-server.bat*
*
*
* RabbitMQ 3.1.0. Copyright (C) 2007-2013 VMware, Inc.*
* ## ## Licensed under the MPL. See http://www.rabbitmq.com/*
* ## ##*
* ########## Logs: C:/RabbitMQ/log/rabbit at OTLABWEB02.log*
* ###### ## C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log*
* ##########*
* Starting broker...*
*
*
*BOOT FAILED*
*===========*
*Timeout contacting cluster nodes: [rabbit at OTLABWEB05,rabbit at OTLABWEB04,*
* rabbit at OTLABWEB03,rabbit at OTLABWEB01,*
* rabbit at OTLABAPP06,rabbit at OTLABAPP05].*
*
*
*DIAGNOSTICS*
*===========*
*nodes in question: [rabbit at OTLABWEB05,rabbit at OTLABWEB04,rabbit at OTLABWEB03,*
* rabbit at OTLABWEB01,rabbit at OTLABAPP06,rabbit at OTLABAPP05]*
*
*
*hosts, their running nodes and ports:*
*- OTLABAPP05: []*
*- OTLABAPP06: []*
*- OTLABWEB01: []*
*- OTLABWEB03: []*
*- OTLABWEB04: []*
*- OTLABWEB05: []*
*
*
*current node details:*
*- node name: rabbit at OTLABWEB02*
*- home dir: U:\*
*- cookie hash: j9x9r680xF6JzFI7IVDLew==*
*
*
*BOOT FAILED*
*===========*
*Error description:*
* {could_not_start,rabbit,*
* {bad_return,*
* {{rabbit,start,[normal,[]]},*
* {'EXIT',*
* {rabbit,failure_during_boot,*
* {error,*
* {timeout_waiting_for_tables,*
*
[rabbit_user,rabbit_user_permission,rabbit_vhost,*
* rabbit_durable_route,rabbit_durable_exchange,*
* rabbit_runtime_parameters,*
* rabbit_durable_queue]}}}}}}}*
*
*
*Log files (may contain more information):*
* C:/RabbitMQ/log/rabbit at OTLABWEB02.log*
* C:/RabbitMQ/log/rabbit at OTLABWEB02-sasl.log*
*
*
*{"init terminating in
do_boot",{rabbit,failure_during_boot,{could_not_start,rabb*
*
it,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot,{
*
*
error,{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_vho
*
*
st,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit
*
*_durable_queue]}}}}}}}}}*
*
*
*Crash dump was written to: erl_crash.dump*
*init terminating in do_boot ()*
I have attached the log files from web02.
Reading the groups and Googling we have managed to recreate the cluster
before, but at the loss of the queues. We would like to retain our queues
and the information they contained. We hope that this is easy to solve,
since servers do unexpectedly go down. :(
Any help would be greatly appreciated.
Thanks
Brendan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at OTLABWEB02.log
Type: application/octet-stream
Size: 1994 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at OTLABWEB02-sasl.log
Type: application/octet-stream
Size: 965 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/55356de1/attachment-0001.obj>
More information about the rabbitmq-discuss
mailing list