[rabbitmq-discuss] RabbitMQ Clustering in a changing network environment

Wed Jun 17 08:10:25 BST 2009

We have several sites which run applications which hook into a
RabbitMQ implemented AMQP bus. At the moment, there's a single site
with a rabbit server, and everything connects into that, but obviously
as things grow and more apps request data etc., it seems like it might
be more efficient for each site to have its own RabbitMQ server to act
as somewhat of a concentrator and perform intelligent routing between
the sites as well as reduce the need for the transmission of duplicate
messages across the inter-site links.

What isn't quite so clear is how one might interconnect several sites
with respect to rabbit clustering when the link between one site and
others changes, if one site can only "connect out" and sites can't
connect back to it, and what happens if the node that contains a given
message queue dies

In our specific case, 3G wireless data links in Australia often have
internal, non-routable, pre-NAT'd addresses assigned to them, such
that when a site's primary ADSL or other connection fails and the
firewall router switches on the 3G link, other sites are no longer
able to connect in - this site is only able to connect out.

In the event of an inter-site link dropping, it's obviously important
to us that rabbit keep going and where alternate links come up, that
rabbit leverage those to keep the client network config simple and
keep everything running.

Reading through the rabbitmq docs, it seems the clustering is pretty
much all done by erlang - a language/environment I have little
experience with I admit..

When nodes join a cluster, do they then attempt to message one another
directly or go through the node they identified with specifically when
joining the cluster?
If they communicate via the specific node they identified to - can
they identify directly with several specific nodes?
Is this through a persistent connection, or something transient?
Are there any caveats with NAT (including the case where one can't map
a port back in) or dynamic IPs?
How does the whole nodename/hostname thing work/resolve?

It also sounded like each site, if it could be arguably cut off from
every other one, should be a disk node?

I guess I'm askin' a lot of questions here - reading the clustering
doc suggests it's primarily aimed for environments where each cluster
is located at a site with a single, pretty reliable connection with a
static IP and hostname which isn't always the case in our
environment...