[rabbitmq-discuss] Fwd: Client-connection failover workarounds (ruby)

Sat Feb 6 18:42:19 GMT 2010

Peter,

Peter Fitzgibbons wrote:
> I'm thinking about how to handle client-connection failover.  By this
> I mean clientA has a producerP that needs to have reliable and
> tolerant connections to rabbitA and rabbitB.  This is so that if the
> connection/pid of rabbitA goes down, producerP sends it's next message
> to rabbitB with as little handling (and wall-clock time) as possible.
> [...]
> The best solution I can think of right now is to have a rabbitP on the
> same machine as producerP, which is clustered to the "real" cluster,
> so if rabbitA goes down, the built-in clustering failover will handle
> the proper interaction.  My issue with this is considering how this is
> configured and maintained when the farm of servers with producerP gets
> to be 1000+.  Even if this doesn't scale to the google-farm level,
> what about farm = 10, 30 ?

Having a local broker is indeed a common approach to get higher 
resilience, particularly since it allows producers to offload messages 
even when the connection to the main brokers is down for prolonged 
periods of time, e.g. in the event the main brokers are across a WAN and 
there is a WAN failure.

As you suspect, making all these brokers part of a cluster would indeed 
be problematic for high node counts, even more so when going across a 
WAN. Besides, RabbitMQ's clustering is really designed for scaling, not 
improved reliability.

But you do not need to cluster the brokers. Instead you could stick our 
experimental shovel (see 
http://www.lshift.net/blog/2010/02/01/rabbitmq-shovel-message-relocation-equipment) 
onto each local node and configure it to shovel messages to the main 
broker pair. The shovel has built-in re-connection and failover logic 
for that.

Of course having a local broker is no panacea for reliability, since 
that broker itself becomes a point of failure. Whether that is a problem 
depends on how high you place the reliability bar.

An alternative approachh is to make the clients themselves deal with 
re-connection and failover. Check out the experimental 
http://hg.rabbitmq.com/rabbitmq-java-messagepatterns/ and 
http://hg.rabbitmq.com/rabbitmq-dotnet-messagepatterns/ libraries which 
do that for a specific type of messaging (point-to-point, though most of 
the logic should be easily ported to other types).

Regards,

Matthias