Ben Hsu ben.hsu at criticalmedia.com
Mon Jul 21 20:26:46 BST 2014


Does anyone on this list have experience running RabbitMQ in the Rackspace
hosting provider? If so, how have you dealt with network partitions?

We have an cluster of 3 rabbitmq nodes hosted in Rackspace. In the last few
months we've seen two network partitioning events: there will be some kind
of network hiccup, and all 3 rabbit nodes will been partitioned from each
other. This requires manual intervention to restart rabbit.

We've been experimenting with pause-minority and autoheal  (
https://www.rabbitmq.com/partitions.html#automatic-handling ). We've found
that with pause-minority, all 3 nodes end up in a partition with one node,
they each then think they're in the minority, and all 3 nodes stop
accepting messages.

With autoheal we've found some bizarre errors. In one test the cluster fell
into 3 separate parts, and the nodes would not rejoin the cluster. In a
second case two of the nodes became partitioned from each other, and the
third node would not start. Error message was:

