[rabbitmq-discuss] Cluster nodes stop/start order can lead to failures

Fri Sep 21 14:13:35 BST 2012

At Thu, 20 Sep 2012 12:52:50 -0700 (PDT),
Jonathan Oliver wrote:
> Wouldn't having no less than 3 nodes help with this? I would imagine that the
> last node, even if it's a RAM node, would have the latest exchanges, bindings,
> etc. and would be able to deliver them authoritatively to the any node coming
> back online.

The rule is simple: when all the nodes in a cluster go down, the last disc node
to go down must be the first one to go up.  This guarantees that the node with
the most up-to-date data is the first one to reappear.

Having a standalone RAM node is a different scenario, and one that should be
avoided (and in fact we try to prevent users from doing so, and we will do more
in the future), since when if the RAM node fails there will be loss of data,
moreover...

> Obviously changes to exchanges and bindings could not occur during the time
> period during which no durable nodes are available

there's nothing that prevents bindings to be created on RAM nodes (but as said
before that's asking for loss of data).

> but I can't see why a RAM node couldn't help in this scenario.

Having 3 nodes simply defers the problem - the limitation of having to bring the
last node to go down up remains.  If the last node to stand is a RAM node then
if that node fails the first node to be brought up must be the last disc node
anyway.

> I haven't yet tested this scenario just yet, but it's simple enough to prove
> with a few AWS cloud instances.

You don't need AWS instances, everything can be done locally, see
<https://rabbitmq.com/clustering.html#single-machine>.

--
Francesco * Often in error, never in doubt