[rabbitmq-discuss] questions about distributed queue

Mon Aug 17 14:32:00 BST 2009

Hi Paul,

Paul Dix wrote:
> I've heard that there are workable solutions to these problems, but I
> wasn't able to dig up anything that made sense. Also, it's noted in
> the FAQ and a few discussions that work is being done on distributed
> queues. How close is this?

The main solution is to separate the problem into two pieces: service
availability and data availability. Then, for data availability (i.e.
effectively replicating the contents of queues) use normal
high-availability network file systems to share the data directories
between nodes. For service availability, Linux-HA or similar can handle
failover and locking.

 - the network filesystem ensures the data is replicated appropriately

 - Linux-HA takes care of locking

 - Linux-HA takes care of starting the standby service when the
   primary goes down

This assumes that you can deal with a nonzero (but arbitrarily small)
failover window. If you absolutely must have 100% uptime (!) then there
are a bunch of other solutions that can be explored, involving redundant
data-paths, replication of message streams, and deduplication at the
client. We find that very few applications really need this.

We do have some plans for simplifying that "100%" uptime solution and
embedding it into the server without need for as much client-side
support, but we're concentrating right now on the new scalable persister
QA. We're likely to address issues of HA once that's done.

Regards,
  Tony
-- 
 [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211
   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060
 []  [] http://www.lshift.net/ | Email: tonyg at lshift.net