[rabbitmq-discuss] HA Queues lost when a node dies

Mon May 7 18:06:15 BST 2012

Hi Bozhidar,

It's hard to tell what happened without looking at the logs and without 
knowing your setup; but a number of severe bugs related to HA were fixed 
in 2.8.2,  so it's definitely worth a try.

If the situation does not improve, please post more details on the list.

Francesco.

On 07/05/12 17:44, Bozhidar Bozhanov wrote:
> Hi,
>
> We are currently trying to run RabbitMQ (2.8.1) in a cluster and use
> highly-available queues. We have around 50 queues. Each queue is
> registered with one of the nodes (at random), as master, and using
> x-ha-policy=all. We have 2 nodes in the cluster.
>
> The management console shows that the cluster is successfully created,
> and that the queues are highly-available and properly mirrored. Then
> we kill one of the nodes (with kill -9) to simulate system failure. We
> have tried this five times, and each time a different result was
> observed:
> - only 1 queue 'survived' (the metadata about the others was deleted
> and they were not visible in the management console, nor we could send
> or consume messages to/from them)
> - all but 3 queues survived
> - only 10 queues survived
> - all queues survived
> - all but 1 queue survived
>
> The queues that survived properly switched their master node to the
> only remaining one.
>
> The results are random, as it seems. Is this expected behaviour? Is it
> likely to be fixed in 2.8.2. And how can we make sure that if a node
> dies, the queues don't get deleted.
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss