[rabbitmq-discuss] Redudancy solutions

Wed Jun 15 12:34:32 BST 2011

Hi Joseph,

On Fri, Jun 10, 2011 at 08:21:58AM -0700, Joseph Marlin wrote:
> I'm looking for some insight on different ways of achieving a HA
> RabbitMQ setup without using DRBD. Because of our current setup, doing
> that kind of configuration would be very difficult to achieve.
> 
> I'd really like to get a setup where if one machine goes down, another
> machine can immediately take over, retaining the exact state of
> RabbitMQ on the failed machine when it went down.

I'd like that too. However, the "exact state" includes things like open
file descriptors, including sockets etc, which is rather difficult to
migrate in this way. The only thing that I'm aware of that might come
close to working for this is the FT feature in VMware vSphere
(http://www.vmware.com/products/fault-tolerance/overview.html).

Even the upcoming active/active HA, which offers RAID-1-style mirroring
of queues will not migrate consumers of those queues in the event of
failure. It's just too damn hard.

> My latest attempt at a redudant solution was two completely separate,
> identical, RabbitMQ setups. A publisher pushed messages to the queues
> of each independent node at the same time. I also had a Cassandra
> cluster node installed on each node. Before a consumer would handle a
> message, it would check in with the Cassandra database to ensure that
> it had not already been handled. This worked fine at extremely slow
> speeds, but at anything above 50 messages/sec, there were simply way
> too many double-handled messages, which are not acceptable.

Yup. You might also try zookeeper but that too I think is going to be
too slow - my understanding is that paxos is quite chatty.

There are clever ways of partitioning the workload though which, whilst
they end up rather complicated, can be made to work. The basic idea is
that you have all your consumers consume from all the duplicate queues.
They then only act on a message if they get sufficient copies of the
same message that they can prove no one else will get more copies. That
works quite nicely, but you have to ensure there can be no draws. So for
example, 2 consumers both subscribed to the same 3 queues is fine
because the outcomes are (0,3), (1,2), (2,1) and (3,0) - there's always
a clear winner. But 2 consumers both subscribe to 2 queues can result in
draws and then you have to do cleverer things to resolve such issues.

If I were you, I'd wait for the upcoming active/active HA stuff to land
before going down the above route.

> Is there an alternative to HA that doesn't involve DRBD/Pacemaker/etc?

Well, other than what's due to arrive shortly. But you can always use
Pacemaker without DRBD - any sort of reliable SAN/NAS thingy should
work ok.

Matthew