[rabbitmq-discuss] A three-node cluster hangs completely in ec2

Fri Aug 23 09:49:33 BST 2013

Hi Jon,

On 22/08/13 20:17, Jon Dokulil wrote:

>  2. Each queue is using an HA policy

Which HA policy are they using? (all/exactly/nodes)

>  3. We notice that a few of the queues still show their "node" as being
>     node A, even though it is not currently running.

>  5. When A comes back online, those queues that show A as their 'node'
>     now show zero mirrors.

What do you mean by "their node"? How did you determine this? What was
the output of "rabbitmqctl cluster_status" when A started up?

>  6. I attempt to delete the queue via the management webapp. At that
>     point all three nodes become 100% unresponsive.

Do nodes still accept connections on port 5672?

> Does anybody have experience with this? What additional information
> should I provide? It's causing a lot of stress and confuses the heck out
> of me. Any guidance is much appreciated.

Are there any errors in the logfiles? The logfiles and sasl logs of all
3 brokers may help to determine the cause.

Please provide the output of "rabbitmqctl report" from just before the
brokers becomes unresponsive.

Can you reproduce the issue 100% reliably, and does this only happen on EC2?

-Emile