[rabbitmq-discuss] Cluster nodes stop/start order can lead to failures

Matt Long matt at crocodoc.com
Wed Sep 12 21:43:14 BST 2012


Say I have node1 and node2 both running as disc nodes in a cluster (there 
are no other nodes in the cluster). If I stop rabbitmq-server on node1 and 
then stop rabbitmq-server on node2, I'm unable to then start 
rabbitmq-server again on node1...in particular, the start command hangs for 
~35 seconds before showing FAILED...

Is this the expected behavior? Note that starting node2 after having 
stopped node1 and then node2 works fine; I'm assuming because node2 was 
aware that node1 had went offline prior to its stopping.

The relevant bit from the startup_log on node1 is :
BOOT FAILED
===========

Timeout contacting cluster nodes: ['rabbit at node2'].

Here's all the details:

*node1*$ sudo service rabbitmq-server stop
Stopping rabbitmq-server: rabbitmq-server.

*node2*$ sudo service rabbitmq-server stop
Stopping rabbitmq-server: rabbitmq-server.

*node1*$ sudo service rabbitmq-server start
Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, 
_err}
rabbitmq-server.

Contents of startup_err:

Crash dump was written to: erl_crash.dump
Kernel pid terminated (application_controller) 
({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})

Tail end of startup_log:

-- rabbit boot start
starting file handle cache server                                     
...done
starting worker pool                                                 
 ...done
starting database                                                     ...

BOOT FAILED
===========

Timeout contacting cluster nodes: ['rabbit at node2'].

DIAGNOSTICS
===========

nodes in question: ['rabbit at node2']

hosts, their running nodes and ports:
- node2: []

current node details:
- node name: 'rabbit at node1'
- home dir: /var/lib/rabbitmq
- cookie hash: xxxredactedxxxxxxxxxxx==


{"Kernel pid 
terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120912/c1c5d73b/attachment.htm>


More information about the rabbitmq-discuss mailing list