[rabbitmq-discuss] How to use federation with clustering for best HA-behaviour?

Thu Jan 17 12:39:58 GMT 2013

On Thu, 17 Jan 2013 12:14:14 +0100, Simon MacMullen <simon at rabbitmq.com> wrote:

> Oh, the queues used by federation are not magic - if you need them to
> work in case of node failure in a cluster then they should be mirrored.

Ah, good!

In the federation plugin docs (http://www.rabbitmq.com/federation.html)
it says:

"ha-policy
  Determines the "x-ha-policy" argument for the upstream queue (see expires). This is only of interest when connecting to old brokers which determine queue HA mode using this argument. Default is 'none', meaning the queue is not HA."

So what it says is not entirely true -- you also need it for "federation
HA" behaviour.

> No, you understand right. Where in the documentation did it say that? I
> can't see anything obvious but I might be suffering from familiarity
> blindness.

Hmm, I just tried to find it, but I've read so many forums and comments
so it might have been some non-official post somewhere, but I'm pretty
sure it was made by someone associated with Rabbit or I wouldn't have
relied on it. Although, when I think about it, I probably didn't check
on the age of the post so maybe the comment was valid for an earlier
RabbitMQ version.

(I also forgot to mention that I'm running 3.0.1.)

> I see that you are using upstream sets to get redundancy in federation
> links - that's not really what they're intended for (we expect each
> upstream in an upstream set to point to a different cluster), so this
> will lead to message ordering getting broken as both upstreams connect
> to the same queue simultaneously.

Hmm, that is a problem for us. Message ordering is important.

(My simple test setup didn't take message ordering into account.)

> However, there's no way at the moment for an upstream to point to
> multiple cluster nodes, with the intention of connecting to exactly one
> of them, so I don't think you have a choice. There should be though.

Is there some whitepaper or similar that describes in more detail how
the federation plugin (and/or RabbitMQ) handles single-node failures?

(I'm Erlang literate, so I could perhaps look at the code. But code
usually takes a while to decipher so it would be quicker if there was
a 'shortcut'. :-)

> I have filed a bug to do this.

Thank you!  I'm hoping for a quick solution, as we are currently in the
process of choosing a message broker for internal CI use and as a fan
of Erlang, I'd rather use RabbitMQ. :-)

BTW, you wouldn't happen to have some tips on how to do upgrades without
downtime?

I've studied the documentation for clustered installations and it seems
you don't use the Erlang facilities for 'upgrading on the fly' and that
any upgrade *will* require downtime. (A problem for any company, but
especially for a large one...)

Kind regards,
Adam A.