[rabbitmq-discuss] Clustering Issue

Tue Sep 1 18:41:10 BST 2009

Tony,

Thanks for responding. I'm not sure we're at a point where either of
these scenarios would be appealing. The first one requires additional
hardware and cost and I agree the second one just seems unnecessarily
redundant, not to mention the burden of clients ensuring uniqueness of a
message, probably using some DHT, again more hardware and cost.

We're fortunate in that our current load requirements are such that we
can experiment a bit with several client-based consensus approaches to
mitigate down servers and provide near real-time failover. My real hope
was that RabbitMQ could provide server-side failover functionality or
some kind of loud, broadcasted failure, which clients could detect to
perform their own fail-over (reconnection, declare exchanges/queues,
etc.). This may fall outside the scope of AMQP but I think it could be
beneficial to your user base. 

For now I think we'll bake our failover into our client implementation.
Thanks again for taking the time to respond.

Jorge

-----Original Message-----
From: Tony Garnock-Jones [mailto:tonyg at lshift.net] 
Sent: Tuesday, September 01, 2009 3:45 AM
To: Jorge Varona
Cc: Matthias Radestock; rabbitmq-discuss at lists.rabbitmq.com
Subject: Re: [rabbitmq-discuss] Clustering Issue

Hi Jorge,

Jorge Varona wrote:
> do you have any recommendations on how I could
> provide a higher durability and fault-tolerance guarantees to my
> consumers?

There are a couple of approaches you could use, depending on your exact
requirements and the level of complexity you're willing to cope with.
The two main classes of solution are

 - run a shared filesystem underneath rabbit, with a hot standby if
   one server should fail. This separates data availability from
   service availability, and reduces the "no service" window to the
   failover interval.

 - run two servers and route *all* traffic through *both*, deduping
   at the receiving client. This gives as many nines of uptime as you
   want, at a cost in administrative complexity and bandwidth wastage

We've been talking about this stuff to various people recently so have
some illustrations of what we mean that might be useful: see
http://dev.lshift.net/tonyg/rabbitmq-intro-ha.pdf. [1]

Regards,
  Tony

[1]: A slightly earlier slide deck has a little bit more explanation
attached, and carries similar content otherwise:
http://dev.lshift.net/tonyg/Achieving%20Scale%20with%20Messaging%20and%2
0the%20Cloud%2020090709%20(with%20notes).pdf

-- 
 [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211
   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060
 []  [] http://www.lshift.net/ | Email: tonyg at lshift.net