Chris Hampson Chris.Hampson at arm.com
Mon Feb 28 15:27:42 GMT 2011

Hi Guys,

We're currently attempting to maintain a RabbitMQ cluster over a WAN between some of our sites, 2 in the US, one in the UK and another in India.

For the most part this seems to be working fine, but it seems a little fragile and we can't seem to get it to recover from failures very well.

Currently we have one disk node, and three RAM nodes, I've set the net_ticktime quite high to try to decrease the chances of timeouts, but we still seem to run into about one problem a week.

Quite often if there is a communication problem at all we end up with a segmented network, and normally the RAM nodes all gang up and ignore the disk node happily going about their business.

If anyone can provide any advice to aid our situation we'd be most grateful (even if it is "don't do that you loony, separate them out and shovel messages between sites when necessary")

There never seems to be much in the log files in the event of a cluster break up, but on request I can go hunting for snippets if they'll be useful.

Many thanks in advance,


