[rabbitmq-discuss] HA Queues lost when a node dies

Mon May 7 17:44:31 BST 2012

Hi,

We are currently trying to run RabbitMQ (2.8.1) in a cluster and use
highly-available queues. We have around 50 queues. Each queue is
registered with one of the nodes (at random), as master, and using
x-ha-policy=all. We have 2 nodes in the cluster.

The management console shows that the cluster is successfully created,
and that the queues are highly-available and properly mirrored. Then
we kill one of the nodes (with kill -9) to simulate system failure. We
have tried this five times, and each time a different result was
observed:
- only 1 queue 'survived' (the metadata about the others was deleted
and they were not visible in the management console, nor we could send
or consume messages to/from them)
- all but 3 queues survived
- only 10 queues survived
- all queues survived
- all but 1 queue survived

The queues that survived properly switched their master node to the
only remaining one.

The results are random, as it seems. Is this expected behaviour? Is it
likely to be fixed in 2.8.2. And how can we make sure that if a node
dies, the queues don't get deleted.