[rabbitmq-discuss] Clustering Internals

Tue Mar 22 23:52:34 GMT 2011

Hi Jason,

On Tue, Mar 22, 2011 at 04:21:42PM -0600, Jason J. W. Williams wrote:
> 1.) When a node dies, it used to be that the queue was auto-recreated on
> another node

I don't think that was ever the case...

> and if the downed node rejoined it's queue contents went poof.
> The old contents died but at least you were assured new messages didn't get
> blackholed. Now it looks like the entire queue goes poof...bindings and all.
> It's not recreated until a consumer redeclares and binds it.

What used to happen was:

1. node goes down, all queues on that node disappear.
2. another client, connected to a surviving node, could recreated the
vanished queues, and use them.

This was a bug and we fixed it. The reason that it was a bug was because
when the original node came back up, it could end up thinking that it's
version of a queue, which had since been recreated elsewhere, was now
the canonical version of the queue. What then happened to queue
contents, and in particular the routing of acks, was extremely broken.

The current behaviour is that in step (2) above, other clients, still
connected to the cluster, will find if they try to redeclare a queue on
a failed node, they get back a 404, but only if the original queue was
durable. If the original queue was not durable then the redeclaration
will succeed and there is no problem. If the original was durable, then
if the failed node comes back up then the queue should spring back to
life, recovering persistent messages off disk as you'd expect.

> 2.) Exchanges to my understanding are differing in that they are fully
> replicated through the cluster.

Correct. Exchanges and bindings are merely rows in mnesia and thus do
not belong or reside on any particular node.

> But does this extend to messages in-flight
> in the exchange? If a node dies, do the messages in-flight go poof or do
> other nodes pass them on since the exchange is replicated?

The act of routing a message is that the channel in which the message
has been publishes looks up the Pids of the queues to which the message
is destined. It then sends the message to those Pids. Depending on the
publish mode (e.g. presence or otherwise of flags such as immediate or
mandatory) it may or may not be important as to whether processes
corresponding to those Pids are still alive.

Whilst you can use mandatory to ensure that a message makes it to at
least one queue, and whilst sending a message to an exchange that
doesn't exist will cause a channel error, you can not assert that a
publish message went to the "correct" number of queues: it always goes
to the correct number of queues as determined at the point of routing.
Thus the 0-or-some that mandatory gives you is the best you can hope
for, generally, without significant additional changes to AMQP.

Best wishes,

Matthew