[rabbitmq-discuss] A three-node cluster hangs completely in ec2
Jon Dokulil
jondokulil at gmail.com
Thu Aug 22 20:17:35 BST 2013
We've seen this happen twice now and each time it's been a pain to work
around (we ended up creating a whole new cluster each time). Here's the
scenario we have seen:
Our setup:
1. Three RabbitMQ 3.1.5 nodes running on the Amazon Linux AMI. Each node
is in a different availability zone in the US-EAST region on AWS. We'll
call them nodes A, B, and C
2. Each queue is using an HA policy
3. All queues are durable
4. We Basic.Publish with DeliveryMode=2
5. All clients are initially connected to node A
The scenario:
1. Node A is shutdown (the last time I did it via 'sudo
/etc/init.d/rabbitmq-server stop
2. All connected clients see the shutdown and successfully transition to
using one of the other nodes. About half connect to node B and the other
half connect to node C
3. We notice that a few of the queues still show their "node" as being
node A, even though it is not currently running.
4. Node A is brought back online. The RabbitMQ management console
(webapp) shows everything is fine on the homepage.
5. When A comes back online, those queues that show A as their 'node'
now show zero mirrors.
6. I attempt to delete the queue via the management webapp. At that
point all three nodes become 100% unresponsive. The management webapp fails
to respond and all communication in our application stops. CPU fluctuates
between 10-40% on but memory doesn't seem to be leaking. It's difficult to
know what is happening because rabbitmqctl is also unresponsive. Attempts
to gracefully stop the nodes all hang.
Does anybody have experience with this? What additional information should
I provide? It's causing a lot of stress and confuses the heck out of me.
Any guidance is much appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130822/e6d7e508/attachment.htm>
More information about the rabbitmq-discuss
mailing list