[rabbitmq-discuss] Fwd: active/active

Mon May 10 23:43:15 BST 2010

It was suggested to me I relay a convo I had with Alexis earlier this
year making the case for better active/active fail-over in RabbitMQ.
Particularly, when working with durable messaging.

---------- Forwarded message ----------
From: Jason J. W. Williams <jasonjwwilliams at gmail.com>
Date: Mon, Jan 18, 2010 at 7:07 PM
Subject: Re: active/active
To: Alexis Richardson <alexis at rabbitmq.com>

Hey Alexis,

Currently, we have a single RMQ 1.5 pair that are setup
active/passive. That is to say, our apps publish to node A and if it
is unavailable publish to node B. Nodes A and B are standalone in that
they are not in a cluster together. This is because the messages being
published are durable, and in the event node A crashes any pending
durable messages would not be auto-resubmitted if node A were
clustered with node B (node B's queue essentially hides the pendings
that were in the queue on node A before the crash).

This set up works OK for us, as the load is fairly light and
durability is more important for these tasks than availability.

That being said we are designing a rework of our statistics logging
infrastructure around Rabbit. Our stats is very high volume and
availability is more important than durability (in fact none of the
messages will be durable). So our intention is to create a new Rabbit
cluster with three nodes and a load-balanced IP in front (since
auto-redirect based on load is no longer supported....nasty memo to
follow... ;) ).

Our desire due to the fact that we're moving from a colo'd environment
to a cloud environment, is to collapse both Rabbit "clusters"
into a single cluster to maximize efficiency. It would also allow our
existing app to benefit from the performance of a load-sharing
environment. For that to happen however, Rabbit would have to do one
of two things when clustered regarding durable messages:

1.) When a crashed node is restarted, replay the pending/durable
messages into the queue on the node now responsible for the queue
after the crash.
2.) Replicate all durable queues to more than one node, so that
durable queue contents continue after a node crash.

Frankly, option 1 is just fine with us and would solve our issue.

Does this help? Thank you for being concerned.

-J