[rabbitmq-discuss] Federation: global exchange

Laing, Michael P. Michael.Laing at nytimes.com
Fri Jul 13 16:55:50 BST 2012


I have been experimenting with federation and thought I would report on the experience as input to development of the capability.

My goal was to create a 'global exchange' that could reside in any or all of the virtual hosts in my rabbitmq clusters. The purpose of the global exchanges is to provide a 'control bus'-like capability wherein messages published to the exchange in any vhost in any cluster appear in the corresponding exchange in every vhost in every cluster. The messages will be relatively small and message rates relatively low, under 100 per second – probably well under.

To implement this I created a 'reflector' vhost in each cluster. Within the cluster, I use federation to create bidirectional links in a star configuration between 'reflector' and each other participating vhost in the cluster. We segregate products/projects etc into their own vhosts so there can be many of them in each cluster.

The clusters in turn are fully connected (link from each cluster to every other cluster) in a federated network using their 'reflector' vhosts.

For simplicity I use a hop count of 3 for each link. Duplicate messages are ignored.

Here are 2 message paths of the many that occur, one within the cluster and one between clusters:

Message -> exchange:Global/vhost:A/cluster:X
(hop 1) -> exchange:Global/vhost:Reflector/cluster:X
(hop 2) -> exchange:Global/vhost:Reflector/cluster:Y
(hop 3) -> exchange:Global/vhost:B/cluster:Y

Message -> exchange:Global/vhost:A/cluster:X
(hop 1) -> exchange:Global/vhost:Reflector/cluster:X
(hop 2) -> exchange:Global/vhost:C/cluster:X
(hop 3) -> exchange:Global/vhost:Reflector/cluster:X

A useful application is a 'confirm' service: when an object is updated in S3 in a region, is the updated version confirmed to be available from clusters in other regions? There is variable lag in consistency in S3 which can cause problems, yet we want to publish a URL to the object as soon as possible.

To confirm availability, the service publishes a 'poll' message to the global exchange. Agents are listening in an 'admin' vhost in each cluster for this poll and attempt (repetitively if need be) to access the S3 object and check its version. They report back and a consolidated report is returned to the requesting app as JSON.

Here's a sample printed version – the object was published by me using my mac in NYC to S3 in us-east-1 via the 'confirm' service running in a vhost in my us-east-1 cluster:

Respond Success Tries Elapsed ms Finish/Start

Overall 1385 2012-07-12T19:29:34.488206Z
2012-07-12T19:29:33.103081Z

Put to s3 227 2012-07-12T19:29:33.330069Z
2012-07-12T19:29:33.103081Z

Poll/Response 1158 2012-07-12T19:29:34.488206Z
2012-07-12T19:29:33.330069Z

•us-east-1 true true 1 122 2012-07-12T19:29:33.460626Z
2012-07-12T19:29:33.338701Z

•ap-northeast-1 true true 1 957 2012-07-12T19:29:34.399986Z
2012-07-12T19:29:33.442609Z

•sa-east-1 true true 1 736 2012-07-12T19:29:34.200525Z
2012-07-12T19:29:33.464345Z

•eu-west-1 true true 1 463 2012-07-12T19:29:33.861053Z
2012-07-12T19:29:33.398861Z

•us-west-1 true true 1 404 2012-07-12T19:29:33.788365Z
2012-07-12T19:29:33.384557Z

An advantage of this approach is that I can run the 'confirm' service in any vhost in any cluster and not have to worry about explicitly federating it across my clusters – I can use this general purpose capability.

We have many similar low-bandwidth general services in mind, e.g. for monitoring. We'll probably upgrade our hosts to Precision Time Service (PTP) to get more accurate timings.

Opportunities for improvement:

. Static configuration is a pain – my understanding is that this is being remedied. This is most important.
. Lots of 'extra' messages are sent in my scenario – perhaps this could be optimized. Not too important.
. Federation within a cluster could perhaps be optimized – actually it is not even documented as a capability. Not too important.

We have some high volume uses as well for which we will use additional, more tailored federated links. However we are finding this general purpose, low volume capability quite useful both in limited practice and as an architectural concept supporting new services.


Michael Laing
NYTimes Media Group
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120713/e418a70d/attachment.htm>


More information about the rabbitmq-discuss mailing list