[rabbitmq-discuss] RabbitMQ Cluster - node hanging on join_cluster - mnesia reporting connection issues

Matt Wise matt at nextdoor.com
Tue Nov 5 17:06:08 GMT 2013


I feel like I've seen similar behavior when multiple nodes in a RabbitMQ 
cluster are restarted at around the same time. I just posted last night 
about a similar outage we had with a 3 node cluster when two of the nodes 
were restarted at about the same time. Our post hasn't made it onto the 
list yet (moderated?) but hopefully it will today.

On Wednesday, October 9, 2013 1:43:19 PM UTC-7, Zach Austin wrote:
>
> The issue was resolved by restarting RabbitMQ on rabbit2.  Not sure why 
> this was required, especially after removing, resetting, and re-adding 
> rabbit1.
>
> On Wednesday, October 9, 2013 12:50:13 PM UTC-5, Zach Austin wrote:
>>
>> Hi All,
>>
>> We're having an issue getting one machine in our rabbit cluster back up 
>> and running after a reboot affected two of the 4 servers in the cluster.
>>
>> Here is the cluster layout:
>> rabbit1
>> rabbit2
>> rabbit3 (master)
>> rabbit4
>>
>> rabbit1 and rabbit2 were rebooted.  Rabbit2 successfully rejoined the 
>> cluster.  Rabbit1 did not.  Additionally, the rabbitMQ will no longer start 
>> on rabbit1.
>>
>> Reviewing the log on rabbit1, I find: Mnesia on 'rabbit1' could not 
>> connect to node(s) ['rabbit2']
>>
>> I can ping rabbit1 from rabbit2 and vice-versa.
>>
>> What I've done so far:
>> 1) Verified the erlang cookie values amongst all cluster nodes are 
>> identical
>> 2) Verified the windows firewall is disabled on all cluster nodes.
>> 2) Issued "rabbitmqctl forget_cluster_node rabbit1" on the rabbit3 master.
>> 2) Deleted the mnesia database on rabbit1.
>> 3) Successfully started RabbitMQ on rabbit1 (deleting mnesia DB did this).
>> 4) Issued "rabbitmqctl stop_app", followed by "rabbitmqctl join_cluster 
>> rabbit3".
>>
>> At this point, rabbitmqctl hangs after the "cluster node... with node... 
>> " line (I waited over 15 minutes).  Reviewing the log on rabbit1 again, I 
>> find the same issue logged: Mnesia on 'rabbit1' could not connect to 
>> node(s) ['rabbit2']
>>
>> Can anyone point me in the direction of what I should check next?
>>
>> Thank you.
>>
>> Zach
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131105/3139c99b/attachment-0001.htm>


More information about the rabbitmq-discuss mailing list