[rabbitmq-discuss] Cluster problem after network bouncing

Laing, Michael michael.laing at nytimes.com
Wed May 14 12:00:29 BST 2014


Actually we have many clusters running across 3 zones in AWS :)

But we are prepared to lose entire regions, wholly or partially.

And we never persist messages in our rabbits - instead we use a
multi-region Cassandra cluster. Oh and S3 for large message bodies.

Plus important messages (anything not individually addressed) are
replicated for processing multiple times across multiple regions, racing to
resolution.

It is a 'rabbits everywhere' strategy: a global mesh of redundant
cooperating clusters that replicate, route, and resolve messages and use
Cassandra and S3 for persistence.

The key to keeping a cluster up across zones in AWS is to never, ever
overload it so there is no interruption of inter-cluster communications.
The key statistic to monitor is IO wait.

We over-provision our cluster members to be sure they have enough
instantaneous resource at all times. And, as I said, we never persist
messages on the cluster.

ml


On Wed, May 14, 2014 at 4:05 AM, Matthias Radestock
<matthias at rabbitmq.com>wrote:

> On 14/05/14 08:58, Simon MacMullen wrote:
>
>> On 13/05/2014 18:04, Leonardo N. S. Pereira wrote:
>>
>>> Hi Simon, thanks very much for your answer.
>>> What is the recommended set up for HA running in AWS?
>>> Is there a way to workaround the partition problem?
>>>
>>
>> Don't cluster across more than two AZs.
>>
>> Unless service availability is more important to you than avoiding data
>> loss, don't cluster across AZs at all.
>>
>
> Also note that in practice the situation you created in your tests, and
> which causes the odd behaviour - partial partitions (where communication
> between nodes is severed in just one direction) - is less likely to occur
> in practice than full partitions.
>
> Matthias.
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140514/1e6a67ba/attachment.html>


More information about the rabbitmq-discuss mailing list