[rabbitmq-discuss] Durable queues and high availability (broker failage)

Mon Jun 29 10:12:33 BST 2009

Hi, new guy here with a bunch of thanks for the awesome rabbitmq!

And a conundrum ...

I'm somewhat puzzled by how to manage a HA config of rabbit, especially
regarding durable queues.

I understand durable queues exist only on the node where created, and
also why. I also get that if that node fails it'll come back up once the
broker is restarted, and that's cool. But, while that broker is down,
the queue can be recreated on another node, because our system didn't
think waiting for mr sysadm to fix the failed node was a viable option.
And it isn't, we got work to do here. :)

Q1) Is there some notification mechanism to rely on to reliably tell a
queues has failed so apps can take measures to manage the failure, i.e
recreate the queue or set up a "temporary fail" queue?

Q2) Is there a way to "replay" recovered messages from the failed,
no-longer-relevant queue, into the newly/upon-failure created one? This
seems it would allow graceful recovery, even automatically so, once the
failed node does come back up.

I'd like refrain from putting too much messaging logic in the apps
themselves, and if possible I'd like to avoid managing a "temporary
failover" state at all in the app.

Overall though, I feel like I'm missing something since this surely
can't be a problem unique to our system.

So, what is the preferred way to handle node failure in a real-life
rabbitmq-based app today?

Cheers,
/Daniel