[rabbitmq-discuss] Single point of failure in a clustered environment with a durable exchange and durable queues

Wed Aug 11 09:38:58 BST 2010

Hello Vince,

Sorry for the late reply.

> I have an clustered environment with 3 nodes having the following set
> up:
> 
>  
> 
> Exchange name:  TestExchange (1 durable exchange)
> 
> Queues: Q1, Q2, Q3 (3 durable queues)
> 
> Nodes: N1, N2, N3
> 
> Cluster: N2 and N3 are clustered against N1.   
> 

Clustering isn't doing quite what you're expecting it to do.

Because the 3 nodes are clustered, they can see each other's queues,
exchanges and binding.  Moreover, connecting to any one of them gives
you access to all of the queues, exchanges, etc.

> The problem that I have is that N1 is a single point of failure.  When I
> stop RabbitMQ service on N1, all the queues stop working.  Publishers
> can no longer connect to any queues and messages cannot be enqueued.  

But, even when clustered, the queues reside only on the node they were
declared on.  In your case, I'm guessing you connect to node N1 and
declare the 3 queues on it.  Unfortunately, this means that if N1 goes
down, its queues become unusable and the messages on them are
temporarily unavailable (persistent ones are anyway, transient ones are
lost).

Put another way, clustering only makes the nodes' resources logically
available to each other.  It doesn't actually replicate queues across
the cluster.

If you really need to always have access to the messages, you could
try following our HA with Rabbit guide (it's in the documentation
section of the website); this explains how to set up a active/passive
rabbit system.

Hope this helps.

Cheers,
Alex