[rabbitmq-discuss] Feature request - automatic recovery from network partitions in simple case
adunar at gmail.com
Thu Mar 14 02:16:57 GMT 2013
I run a simple cluster of two RabbitMQ 3.0.x nodes with mirrored
high-availability queues. Only one of the two nodes is actively used at any
given time; the other node is a "hot spare" that can be ready in case the
first node goes offline.
Occasionally, my cloud hosting provider has network interruptions within
their internal network. RabbitMQ then detects a network partition, and
stops mirroring queues until I manually restart the nodes. In order to
avoid losing all queued messages, I have to manually stop the spare node,
then restart the active node, then start the spare node again.
I understand the reasoning given in http://www.rabbitmq.com/partitions.html
for why recovery doesn't happen automatically in the general case.
However, in my simple case, only one of the nodes has any activity during
the network partition, so there isn't any ambiguity about what should
happen when the network starts working again: The "spare" node should
simply restore its state from the "active" node without me having to
manually restart them in the correct order.
I don't know how feasible this request is, but it would be great if
RabbitMQ's partition recovery could be slightly more intelligent so that
having a spare RabbitMQ node doesn't require as much handholding.
Apologies in advance if this has already been discussed or if this is the
wrong forum. I'd also like to thank the developers for such a great product!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rabbitmq-discuss