[rabbitmq-discuss] RabbitMQ load balancing/failover with LVS

Wed Jul 29 02:02:46 BST 2009

Niko,

Niko Felger wrote:
> In order to allow our clients to always point to a single host,
> regardless of which nodes are up, we set up LVS load balancing on a
> third server called 'lb1'. However, once we do this, we experience
> issues with low-volume queues. It goes roughly like this:
> 
> - Consumer starts and establishes a connection to lb1.
> - lb1 forwards packets from the consumer to e.g. mq1.
> - At this point, the consumer has an established connection to lb1,
> mq1 has an established connection directly to the consumer, and
> messages published to the queue reach the consumer.

So the consumer has two connections - one to lb1 and one to mq1? That 
seems weird.

Also, you say that mq1 has a connection *to* the consumer - are you 
implying that your load-balancing reverses the direction of connection 
establishment?

> - After ~5-10 minutes without messages published to the queue, the
> connection on the consumer goes away, and it establishes a new
> connection to lb1. mq1 at this point still has an established to
> connection to the consumer on the original port, in addition to the
> new connection. Messages published to the queue in question are now no
> longer delivered to the consumer.

Any idea what causes the original connection to get dropped?

Also, what client are you using, and do you have heartbeats enabled on 
the AMQP connection?

> - We start another consumer, but it doesn't receive messages either.
> - After some more time, the original connection times out
> ({inet_error,etimedout}), and messages get processed again, but only
> by the second consumer.

That's a bit strange ... I would expect at least *some* messages to make 
it through via the new connections, since the queue delivers messages 
round-robin(ish) to connected consumers.

> The problem seems to be the load balancer dropping connections, but
> since we're using it successfully in a few other cases, I thought I
> could get some input on whether this is even a sensible strategy for
> doing failover for RabbitMQ, and if anyone has experience with setups
> similar to ours.

I am pretty sure some folks have used RabbitMQ behind a load balancer. 
Rabbit doesn't do anything fancy at the TCP/IP level, so generally this 
should work ok.

> PS: We're also seeing plenty of this in the rabbit.log, repeating
> every 30 seconds:
> =ERROR REPORT==== 27-Jul-2009::16:27:24 ===
> ** Generic server <0.9049.9> terminating
> ** Last message in was {inet_async,#Port<0.222>,41513,{ok,#Port<0.236483>}}
> ** When Server state == {state,{rabbit_networking,start_client,[]},
>                              #Port<0.222>,
>                              41513}
> ** Reason for termination ==
> ** {{badmatch,{error,enotconn}},
>   [{tcp_acceptor,handle_info,2},
>    {gen_server,handle_msg,6},
>    {proc_lib,init_p,5}]}

That looks like a connection is getting dropped before the server has 
completed accepting it. I have never seen this before, so would suspect 
your load balancer is doing something weird.

Regards,

Matthias.