[rabbitmq-discuss] A three-node cluster hangs completely in ec2
stuff at moesel.net
Thu Aug 22 21:20:58 BST 2013
I'm not 100% familiar with Amazon's availability zones and how they work,
but... it sounds to me like they are in different locations and different
networks? If so, clustering is probably not a good idea in this case.
I don't know if this is the cause for the issues you've seen, but it may be
the cause of issues in the future... On the other hand, if I am wrong
about availabity zones, then you can safely disregard this message! ;-)
On Thu, Aug 22, 2013 at 3:17 PM, Jon Dokulil <jondokulil at gmail.com> wrote:
> We've seen this happen twice now and each time it's been a pain to work
> around (we ended up creating a whole new cluster each time). Here's the
> scenario we have seen:
> Our setup:
> 1. Three RabbitMQ 3.1.5 nodes running on the Amazon Linux AMI. Each
> node is in a different availability zone in the US-EAST region on AWS.
> We'll call them nodes A, B, and C
> 2. Each queue is using an HA policy
> 3. All queues are durable
> 4. We Basic.Publish with DeliveryMode=2
> 5. All clients are initially connected to node A
> The scenario:
> 1. Node A is shutdown (the last time I did it via 'sudo
> /etc/init.d/rabbitmq-server stop
> 2. All connected clients see the shutdown and successfully transition
> to using one of the other nodes. About half connect to node B and the other
> half connect to node C
> 3. We notice that a few of the queues still show their "node" as being
> node A, even though it is not currently running.
> 4. Node A is brought back online. The RabbitMQ management console
> (webapp) shows everything is fine on the homepage.
> 5. When A comes back online, those queues that show A as their 'node'
> now show zero mirrors.
> 6. I attempt to delete the queue via the management webapp. At that
> point all three nodes become 100% unresponsive. The management webapp fails
> to respond and all communication in our application stops. CPU fluctuates
> between 10-40% on but memory doesn't seem to be leaking. It's difficult to
> know what is happening because rabbitmqctl is also unresponsive. Attempts
> to gracefully stop the nodes all hang.
> Does anybody have experience with this? What additional information should
> I provide? It's causing a lot of stress and confuses the heck out of me.
> Any guidance is much appreciated.
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rabbitmq-discuss