[rabbitmq-discuss] High Availability

Thu Mar 18 13:12:59 GMT 2010

Hi,

While I don't know how HIGH is your high, I can tell you what we do for our live site.

Currently we have around 400.000 messages a day  –which is not much–, sent to 2 RabbitMQ servers running in low end machines. The unixload on those machines is always below 0.2

So here's my story of HA based on that setup:

When we deployed the system we started the two servers, NOT using clustering. Their version was 1.5.2. This was mid 2009. 

Then we have 28 PHP machines publishing the messages to a SINGLE IP, using IP Failover (Heartbeat+LVS), this means that for the PHP publishers there's only one IP to connect-to/send-messages.

Then we have 2 PHP machines consuming messages, but they connect directly to the IPs of the RabbitMQ servers, we do that to be sure that we connect directly to a specific broker. 

This also means that the queues, exchanges, bindings, etc, are all duplicated between the two servers.

But there was a problem! We wanted to upgrade to the latest version of RabbitMQ which has a non compatible storage format, at least with our version of RabbitMQ.

What we did was the following:

The sysadmin took out of the load balancing one of the brokers, and we waited till the workers consumed all the messages. When their queues were empty, we shut down the server and did the upgrade. Then we put it back into the load balancer and repeated the process with the other broker.

In this way we didn't lose any message.

But we wanted to test the native RabbitMQ clustering... 

The sysadmin ran the commands from the Clustering Guide and we had the cluster up an running, until we had another problem...

Sometimes the RabbitMQ sent the redirect response to the consumers, and told them, to connect to the other node, the problem we had here, is that RabbitMQ uses the node() function from Erlang, which for the way we have configured the /etc/hosts file, it was returning a node that was unreachable from outside (This was only happening to the consumers, because they connect directly to the RabbitMQ nodes).

So here we did the same as before, we took one broker out of the load balancing, we took it out of the clustering, and put it back again, and the same thing with the other node. 

Again we didn't lose any message.

Then on the connection configuration, we have a really simple .yml file to tell the PHP process where to connect, basically by providing an argument on the CLI.

I hope this helps,

Alvaro

On Mar 18, 2010, at 7:57 PM, Gustavo Aquino wrote:

> Hi,
> 
> I have done this question before for many peoples, without success, because I don't found (Documentation, discussion lists and etc) any way to do High Availability with RabbitMQ without a lot of workaround, so exist a way to do HA with RabbitMQ without implementing a lot of stuffs by client side, like recreating queues when node down, recreating configurations, recreating client connections and etc ?
> 
> What's recommendation from RabbitMQ to do HA ?
> 
> Someone here have done some HA implementation to RabbitMQ ?
> 
> Regards.
> 
> Gustavo
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss