[rabbitmq-discuss] RabbitMQ Cluster - node hanging on join_cluster - mnesia reporting connection issues

Zach Austin zachary.w.austin at gmail.com
Wed Oct 9 18:50:13 BST 2013


Hi All,

We're having an issue getting one machine in our rabbit cluster back up and 
running after a reboot affected two of the 4 servers in the cluster.

Here is the cluster layout:
rabbit1
rabbit2
rabbit3 (master)
rabbit4

rabbit1 and rabbit2 were rebooted.  Rabbit2 successfully rejoined the 
cluster.  Rabbit1 did not.  Additionally, the rabbitMQ will no longer start 
on rabbit1.

Reviewing the log on rabbit1, I find: Mnesia on 'rabbit1' could not connect 
to node(s) ['rabbit2']

I can ping rabbit1 from rabbit2 and vice-versa.

What I've done so far:
1) Verified the erlang cookie values amongst all cluster nodes are identical
2) Verified the windows firewall is disabled on all cluster nodes.
2) Issued "rabbitmqctl forget_cluster_node rabbit1" on the rabbit3 master.
2) Deleted the mnesia database on rabbit1.
3) Successfully started RabbitMQ on rabbit1 (deleting mnesia DB did this).
4) Issued "rabbitmqctl stop_app", followed by "rabbitmqctl join_cluster 
rabbit3".

At this point, rabbitmqctl hangs after the "cluster node... with node... " 
line (I waited over 15 minutes).  Reviewing the log on rabbit1 again, I 
find the same issue logged: Mnesia on 'rabbit1' could not connect to 
node(s) ['rabbit2']

Can anyone point me in the direction of what I should check next?

Thank you.

Zach
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131009/176ff5ce/attachment.htm>


More information about the rabbitmq-discuss mailing list