[rabbitmq-discuss] Feature request - automatic recovery from network partitions in simple case

Jesse Young adunar at gmail.com
Thu Mar 14 02:16:57 GMT 2013


I run a simple cluster of two RabbitMQ 3.0.x nodes with mirrored 
high-availability queues. Only one of the two nodes is actively used at any 
given time; the other node is a "hot spare" that can be ready in case the 
first node goes offline.

Occasionally, my cloud hosting provider has network interruptions within 
their internal network. RabbitMQ then detects a network partition, and 
stops mirroring queues until I manually restart the nodes. In order to 
avoid losing all queued messages, I have to manually stop the spare node, 
then restart the active node, then start the spare node again. 

I understand the reasoning given in http://www.rabbitmq.com/partitions.html 
for why recovery doesn't happen automatically in the general case.

However, in my simple case, only one of the nodes has any activity during 
the network partition, so there isn't any ambiguity about what should 
happen when the network starts working again: The "spare" node should 
simply restore its state from the "active" node without me having to 
manually restart them in the correct order. 

I don't know how feasible this request is, but it would be great if 
RabbitMQ's partition recovery could be slightly more intelligent so that 
having a spare RabbitMQ node doesn't require as much handholding.

Apologies in advance if this has already been discussed or if this is the 
wrong forum. I'd also like to thank the developers for such a great product!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130313/1546c3b2/attachment.htm>

More information about the rabbitmq-discuss mailing list