[rabbitmq-discuss] Pulling RabbitMQ out of service

Fri Feb 4 13:10:45 GMT 2011

On Mon, Jan 31, 2011 at 07:26:34AM -0800, Bill Moseley wrote:
> For those of you running multiple RabbitMQ servers in a cluster, what is
> your procedure when you want to shut one of the servers down (e.g. for
> maintenance) but not disrupt overall service?   Queues only live on one
> server so I'm wondering how (or if) you do something to flush out the queue
> before stopping the machine.

Usual best practise is to force clients to reconnect elsewhere,
recreating the resources they need. This may need some careful thought
with ordering of events etc. Frequent best practise is that publishers
create exchanges, and consumers create the queues they need and bind
them as necessary. To avoid missing any messages you'll need to start up
new consumers before taking down the old ones. But they must create new
queues, not on the to-die node. So this requires the queue names must be
fresh, but then you're going to have to deal with the possibilities of
duplicate messages during the period that multiple sets of consumers are
up etc.

The "or-else" routing semantics of RabbitMQ's "Alternate Exchanges" may
well be of use here.
http://www.rabbitmq.com/extensions.html#alternate-exchange

> Now, this is a bit tougher: How about catastrophic failures?  I'm wondering
> about using the complexity of Pacemaker and DRBD vs. tracking incomplete
> jobs and resubmitting after some time.

Horses for courses really. We know of a number of clients who are using
the pacemaker stuff, though frequently with NAS/SAN rather than DRBD. If
you can work out what failures you can withstand and what you can't and
then pick the best approach to match.

Matthew