[rabbitmq-discuss] backup and fail-over to remote disaster recovery site

Simon MacMullen simon at rabbitmq.com
Fri Oct 19 14:00:25 BST 2012


On 19/10/12 12:48, Terance wrote:
> We want to set up RabbitMQ such that the current state of the broker is
> backed up at a remote disaster recovery site at real time. We also want our
> clients (producers and consumers) to fail-over to this remote broker in case
> the main broker becomes unreachable for some reason. I was looking at the
> Distributed broker approaches supported by RabbitMQ at
> http://www.rabbitmq.com/distributed.html but I'm not sure If what we are
> looking for can be achieved. Please let me know if you know how to do this.

Hi. Interesting question. Some thoughts:

CAP theorem tells us you're not going to get a transparent solution to 
this problem. If it's remote you need to be partition-tolerant, and you 
almost certainly want to be available too. So consistency has to go. 
That rules out clustering.

So you could use the shovel or federation to get messages published from 
your main site to your recovery site. That's fairly easy (assuming your 
broker definitions are not too dynamic); what is harder is ensuring that 
messages are consumed from the recovery site in some sort of correlation 
with them being consumed at the main site. There's nothing built in to 
Rabbit which can do that.

There are some possibilities which may or may not work for you. You 
could federate / shovel into queues with a message TTL at the remote 
site to bound the amount of data you hold - but at failover you could 
have a lot of duplicate messages to work through, and if your main site 
queues back up enough, messages could be expired at the remote site when 
they have not been processed at the main site.

Possibly the most plausible solution is to synchronise the mnesia 
directory from main to remote and only boot the remote broker on 
failover. This stands a decent chance of recovering your persistent 
messages of failure; but keeping the filesystems reliably in sync is its 
own challenge. And we don't guarantee to recover everything after (the 
equivalent of) an unclean shutdown.

Hmm, you've got me thinking about replication now...

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, VMware


More information about the rabbitmq-discuss mailing list