<div dir="ltr">We are currently running a rabbitmq cluster running on RHEL 6.5 64 bit os, comprising of 10 nodes. We use the auto-configuration for the cluster where the first node doesn't have any cluster_nodes specified in the rabbitmq.config and all other nodes have just the first node specified in the cluster_nodes. I bring up the first node and then the rest of the nodes. I see the cluster is setup correctly and things seem to work fine. <div>
<br></div><div>However occasionally when the nodes reboot I see the startup hangs in Starting rabbitmq-cluster. It seems to hang forever and doesn't timeout or anything. In some cases we have left the system for a couple of hours and it doesn't seem to timeout, suggesting the system is in a deadlock or something. A reset of the node in the hung state sometimes recovers and sometimes it doesn't. </div>
<div><br></div><div>The strange part is I cannot reproduce this at will but it happens nevertheless.</div><div><br><div>Has anyone seen this behavior?</div><div><br></div><div>Is specifying the cluster_nodes the way I described is the correct way to do so?</div>
<div><br></div><div>I would appreciate if anyone has any suggestions on how to deal with this issue..</div><div> <div><div>Thanks</div><div>Ramesh</div><div><br></div><div><br></div><div><div> {running_applications,[{rabbit,"RabbitMQ","3.3.1"},</div>
<div> {os_mon,"CPO CXC 138 46","2.2.7"},</div><div> {xmerl,"XML parser","1.2.10"},</div><div> {mnesia,"MNESIA CXC 138 12","4.5"},</div>
<div> {sasl,"SASL CXC 138 11","2.1.10"},</div><div> {stdlib,"ERTS CXC 138 10","1.17.5"},</div><div> {kernel,"ERTS CXC 138 10","2.14.5"}]},</div>
<div> {os,{unix,linux}},</div><div> {erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:30] [kernel-poll:true]\n"},</div></div><div><br></div></div></div></div></div>