[rabbitmq-discuss] deploying to rackspace cloud -- network partitions?

Ben Hsu ben.hsu at criticalmedia.com
Mon Jul 21 20:26:46 BST 2014


Hello,

Does anyone on this list have experience running RabbitMQ in the Rackspace
hosting provider? If so, how have you dealt with network partitions?

We have an cluster of 3 rabbitmq nodes hosted in Rackspace. In the last few
months we've seen two network partitioning events: there will be some kind
of network hiccup, and all 3 rabbit nodes will been partitioned from each
other. This requires manual intervention to restart rabbit.

We've been experimenting with pause-minority and autoheal  (
https://www.rabbitmq.com/partitions.html#automatic-handling ). We've found
that with pause-minority, all 3 nodes end up in a partition with one node,
they each then think they're in the minority, and all 3 nodes stop
accepting messages.

With autoheal we've found some bizarre errors. In one test the cluster fell
into 3 separate parts, and the nodes would not rejoin the cluster. In a
second case two of the nodes became partitioned from each other, and the
third node would not start. Error message was:

"inet_tcp",{{badmatch,{error,ehostunreach}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140721/17c4ee03/attachment.html>


More information about the rabbitmq-discuss mailing list