[rabbitmq-discuss] RabbitMQ production setup questions around clustering

Mon Jul 26 16:09:44 BST 2010

Hi Aaron,

On Thu, Jul 22, 2010 at 08:33:48AM -0400, Aaron Westendorf wrote:
> So I can
> connect to any host in the cluster, call basic.consume(), and receive
> messages from a queue that resides on another host?

Correct. However, the messages are only stored on the node on which the
queue was created, thus latency will be higher due to the extra hop than
connecting to the queue-local node.

> What about the
> case where multiple clients start at the same time and they each
> declare the queue and its bindings (identically)?

By using distributed transactions in mnesia, we detect this case. Also,
queue declaration is idempotent, so the "winner" of the race will indeed
create a new queue. The "loser" of the race will return the queue that
was created by the winner.

> I swear that we
> tried this exact setup, maybe in the 1.5 series, and we kept getting
> dropped messages.  This has a big effect on how one can go about
> clustering and failover, and matches our original plan for a generic
> pool of rabbit hosts.

There were some bugs back then, especially with regards to what happens
when a queue is declared that was previously declared on a now-failed
node. That used to be permitted and caused all sorts of problems when
the failed node came back up (if not before). That case is not expressly
forbidden and will result in a 404 (literally - the node on which this
queue has been created cannot be found).

With regards to fail over, it should be noted that clustering is not a
means for HA because of the fact that when a node goes down it takes its
queues with it, and those queues can't be recovered before the node
comes back up.

> We have very wide pipes between our hosts, but there's still overhead
> in handling inter-node traffic.  Is there an optimal setup, such as
> all consumers of a queue connected to the host on which it resides?

Yes. Within a cluster, there is no batching of deliveries at least
nothing we're doing explicitly. I don't think the VM does anything magic
underneath but I could be wrong. Thus you may wish to try to ensure that
clients are directly connected to the correct nodes for their queues.

On the publishing side, we do do explicit batching so if you are
publishing a message which ends up going to several queues all on some
other node, then that will only be sent as one message to the other node
rather than N messages.

Matthew