[rabbitmq-discuss] Cluster Disk Node failure and cluster membership

Kevin Nuckolls kevin.nuckolls at gmail.com
Fri May 25 05:29:49 BST 2012


I'm hoping to automate the clustering steps with Chef and I have one 
pivotal question: If I have a cluster with three disc nodes in the cloud 
and one of them goes down and is gone forever, will the other nodes 
eventually give up on treating it as a disc node?

In that scenario, cluster_status would look something like this after 
kevin-rabbit1 terminated forever:

Cluster status of node 'rabbit at kevin-rabbit2' ...
[{nodes,[{disc,['rabbit at kevin-rabbit3','rabbit at kevin-rabbit2',
                'rabbit at kevin-rabbit1']}]},
 {running_nodes,['rabbit at kevin-rabbit3','rabbit at kevin-rabbit2']}]
...done.

The cluster just thinks that kevin-rabbit1 is "off" and that it might come 
back someday, right? Is there any way for me to tell the cluster "sorry 
guys, that node is dead _forever_. Just forget about it and move on with 
your lives."

I saw this language in the cluster docs which makes me concerned about 
making sure that the cluster_status is clean and up to date with what nodes 
are alive and dead.

There are some important caveats:

   - All disk nodes must be running for certain operations, most notably 
   leaving a cluster, to succeed.
   - At least one disk node should be running at all times.
   - When all nodes in a cluster have been shut down, restarting any node 
   will suspend for up to 30 seconds and then fail if the last disk node that 
   was shut down has not been restarted yet. Since the nodes do not know what 
   happened to that last node, they have to assume that it holds a more 
   up-to-date version of the broker state. Hence, in order to preserve data 
   integrity, they cannot resume operation until that node is restarted.


That's why I want to make sure that the other nodes know that the lost disc 
node is never, ever coming back. 

Anyway thanks. Any insight will help :)

Kevin Nuckolls
Senior Software Engineer
Mosaik Solutions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120524/77b696de/attachment.htm>


More information about the rabbitmq-discuss mailing list