[rabbitmq-discuss] Cluster nodes stop/start order can lead to failures

Jignesh Purohit purohit.jignesh at gmail.com
Thu Sep 13 09:57:15 BST 2012


Hi Matt,

I too facing the same problem so kindly let me know if you get any solution 
for this problem.

Regards
Jignesh Purohit


On Thursday, September 13, 2012 2:13:14 AM UTC+5:30, Matt Long wrote:
>
> Say I have node1 and node2 both running as disc nodes in a cluster (there 
> are no other nodes in the cluster). If I stop rabbitmq-server on node1 and 
> then stop rabbitmq-server on node2, I'm unable to then start 
> rabbitmq-server again on node1...in particular, the start command hangs for 
> ~35 seconds before showing FAILED...
>
> Is this the expected behavior? Note that starting node2 after having 
> stopped node1 and then node2 works fine; I'm assuming because node2 was 
> aware that node1 had went offline prior to its stopping.
>
> The relevant bit from the startup_log on node1 is :
> BOOT FAILED
> ===========
>
> Timeout contacting cluster nodes: ['rabbit at node2'].
>
> Here's all the details:
>
> *node1*$ sudo service rabbitmq-server stop
> Stopping rabbitmq-server: rabbitmq-server.
>
> *node2*$ sudo service rabbitmq-server stop
> Stopping rabbitmq-server: rabbitmq-server.
>
> *node1*$ sudo service rabbitmq-server start
> Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, 
> _err}
> rabbitmq-server.
>
> Contents of startup_err:
>
> Crash dump was written to: erl_crash.dump
> Kernel pid terminated (application_controller) 
> ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})
>
> Tail end of startup_log:
>
> -- rabbit boot start
> starting file handle cache server                                     
> ...done
> starting worker pool                                                 
>  ...done
> starting database                                                     ...
>
> BOOT FAILED
> ===========
>
> Timeout contacting cluster nodes: ['rabbit at node2'].
>
> DIAGNOSTICS
> ===========
>
> nodes in question: ['rabbit at node2']
>
> hosts, their running nodes and ports:
> - node2: []
>
> current node details:
> - node name: 'rabbit at node1'
> - home dir: /var/lib/rabbitmq
> - cookie hash: xxxredactedxxxxxxxxxxx==
>
>
> {"Kernel pid 
> terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120913/6032055b/attachment.htm>


More information about the rabbitmq-discuss mailing list