[rabbitmq-discuss] A three-node cluster hangs completely in ec2

Jon Dokulil jondokulil at gmail.com
Thu Aug 22 20:17:35 BST 2013


We've seen this happen twice now and each time it's been a pain to work 
around (we ended up creating a whole new cluster each time). Here's the 
scenario we have seen:

Our setup:

   1. Three RabbitMQ 3.1.5 nodes running on the Amazon Linux AMI. Each node 
   is in a different availability zone in the US-EAST region on AWS. We'll 
   call them nodes A, B, and C
   2. Each queue is using an HA policy
   3. All queues are durable
   4. We Basic.Publish with DeliveryMode=2
   5. All clients are initially connected to node A

The scenario:

   1. Node A is shutdown (the last time I did it via 'sudo 
   /etc/init.d/rabbitmq-server stop
   2. All connected clients see the shutdown and successfully transition to 
   using one of the other nodes. About half connect to node B and the other 
   half connect to node C
   3. We notice that a few of the queues still show their "node" as being 
   node A, even though it is not currently running.
   4. Node A is brought back online. The RabbitMQ management console 
   (webapp) shows everything is fine on the homepage.
   5. When A comes back online, those queues that show A as their 'node' 
   now show zero mirrors.
   6. I attempt to delete the queue via the management webapp. At that 
   point all three nodes become 100% unresponsive. The management webapp fails 
   to respond and all communication in our application stops. CPU fluctuates 
   between 10-40% on but memory doesn't seem to be leaking. It's difficult to 
   know what is happening because rabbitmqctl is also unresponsive. Attempts 
   to gracefully stop the nodes all hang.

Does anybody have experience with this? What additional information should 
I provide? It's causing a lot of stress and confuses the heck out of me. 
Any guidance is much appreciated.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130822/e6d7e508/attachment.htm>


More information about the rabbitmq-discuss mailing list