[rabbitmq-discuss] Clustering Issue

Fri Aug 28 23:08:20 BST 2009

Mattias,

Thanks for the quick response. I understand that if the Server A dies
other nodes should be able to assume that the queue is dead as well. My
real concern is that the queue(s) are not distributed and the box that
carries the physical queue is a single point of failure. While the
current clustering helps load-balance read operations very well it
doesn't address durability and reliability concerns, which happen to be
important to my project. 

I guess my real question is if there are or will be efforts to
distribute the queues across physical nodes? I understand that this
implies locking, consensus, and a bunch of other things that could
hinder scale. If not, do you have any recommendations on how I could
provide a higher durability and fault-tolerance guarantees to my
consumers?

Jorge

-----Original Message-----
From: Matthias Radestock [mailto:matthias at lshift.net] 
Sent: Friday, August 28, 2009 2:52 PM
To: Jorge Varona
Cc: rabbitmq-discuss at lists.rabbitmq.com
Subject: Re: [rabbitmq-discuss] Clustering Issue

Jorge,

Jorge Varona wrote:
> I've noticed some issues with clustered boxes that are weird. For 
> example, in a two-box cluster I have Client A sending messages to
Server 
> A and Client B pulling messages from Server B. We already know that if

> we shut down Server A (it was first to declare a queue) messages stop 
> being delivered to Server B and in turn Client B. The strange behavior

> I've noticed is that if I bring Server A back up and send messages to
it 
> they are not relayed to Server B, which has Client B attached. Only 
> after I restart Server B do messages begin to be relayed to Client B.

When node A dies, as far as B is concerned all the queues on A die too. 
If client B attempted a 'basic.get', or indeed any other operation on 
any of A's queues it would get a 'not_found' error.

BUT - and this is what you are seeing - there is no way in AMQP to 
inform existing subscribers that a queue has vanished.

This isn't just a problem for clustering - you can run into the same 
issue on just a single node if one client consumes from a queue and 
another client removes that queue.

> Here are my assumptions:
> 
> 1. Queues exist only on the server on which they were first declared.
> 
> 2. Nodes within a cluster relay requests to the server on which the 
> queue exists instead of messages being relayed to the server after
first 
> received.

Both correct.

> are there efforts to address these issues/scenarios?

It is possible that AMQP 1.0 addresses this. Not sure.

Regards,

Matthias.