[rabbitmq-discuss] Unclean shutdown followed by upgrade causes cluster to no longer come up.

Mon Mar 26 13:11:34 BST 2012

Hi,

I have a cluster of three nodes which are auto-configured to be clustered.  The nodes are node1 at mq1, 
node2 at mq2 and node3 at mq3, which are a disc node and two RAM nodes respectively.  A few days ago in 
our test environment, something was powered off which caused all three machines with the nodes on to 
power off.  I decided (probably wrongly) that now was a good time to upgrade them to 2.8.1 (as there 
was a bug fix I was waiting for).  I've upgraded each node to 2.8.1, then attempted to start up the 
nodes.  Node1 started up correctly, but doesn't show any of the other nodes in the cluster status, 
even though the rabbitmq.conf file has them auto-configured.  Then, when I try to start up node2 and 
node3, they both give out the following message in /var/log/rabbit/startup_log:

[root at mq2 /etc/rabbitmq]# cat /var/log/rabbitmq/startup_log
Activating RabbitMQ plugins ...
6 plugins activated:
* amqp_client-2.8.1
* mochiweb-1.3-rmq2.8.1-git
* rabbitmq_management-2.8.1
* rabbitmq_management_agent-2.8.1
* rabbitmq_mochiweb-2.8.1
* webmachine-1.7.0-rmq2.8.1-hg

****
Cluster upgrade needed but this is a ram node.
Please first start the last disc node to shut down.
****

It seems like nothing I do (no combination of starting, stopping, resetting, etc) can make the 
cluster come back online.  My guess is that this has to do with the ungraceful shutdown of the disc 
node, but surely there must be a way to force it back into the cluster?

Here are the auto-configuration files from each node:

node1:
[{rabbit, [{cluster_nodes, ['node1 at mq1', 'node2 at mq2', 'node3 at mq3']}]}].

node2:
[{rabbit, [{cluster_nodes, ['node1 at mq1', 'node3 at mq3']}]}].

node3:
[{rabbit, [{cluster_nodes, ['node1 at mq1', 'node2 at mq2']}]}].

And here is the status and cluster status output of node1:

[root at mq1 /etc/rabbitmq]# rabbitmqctl status
Status of node node1 at mq1 ...
[{pid,2581},
  {running_applications,
      [{rabbitmq_management,"RabbitMQ Management Console","2.8.1"},
       {rabbitmq_management_agent,"RabbitMQ Management Agent","2.8.1"},
       {amqp_client,"RabbitMQ AMQP Client","2.8.1"},
       {rabbit,"RabbitMQ","2.8.1"},
       {os_mon,"CPO  CXC 138 46","2.2.7"},
       {sasl,"SASL  CXC 138 11","2.1.10"},
       {rabbitmq_mochiweb,"RabbitMQ Mochiweb Embedding","2.8.1"},
       {webmachine,"webmachine","1.7.0-rmq2.8.1-hg"},
       {mochiweb,"MochiMedia Web Server","1.3-rmq2.8.1-git"},
       {inets,"INETS  CXC 138 49","5.7.1"},
       {mnesia,"MNESIA  CXC 138 12","4.5"},
       {stdlib,"ERTS  CXC 138 10","1.17.5"},
       {kernel,"ERTS  CXC 138 10","2.14.5"}]},
  {os,{unix,linux}},
  {erlang_version,
      "Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:30] [kernel-poll:true]\n"},
  {memory,
      [{total,30333584},
       {processes,11865208},
       {processes_used,11854376},
       {system,18468376},
       {atom,1348801},
       {atom_used,1327436},
       {binary,145744},
       {code,14619638},
       {ets,1053712}]},
  {vm_memory_high_watermark,0.39999999922162066},
  {vm_memory_limit,205555302},
  {file_descriptors,
      [{total_limit,924},{total_used,3},{sockets_limit,829},{sockets_used,1}]},
  {processes,[{limit,1048576},{used,167}]},
  {run_queue,0},
  {uptime,5}]
...done.
[root at mq1 /etc/rabbitmq]# rabbitmqctl cluster_status
Cluster status of node node1 at mq1 ...
[{nodes,[{disc,[node1 at mq1]}]},{running_nodes,[node1 at mq1]}]
...done.

OS is Centos release 6.2 (Final). Upgrade of rabbitmq was from 2.7.1 to 2.8.1.

-- 
Kind regards,

Adam Pollock.
Lead Software Engineer.

Encoded, Ltd.
T: +44 (0)845 120 9790
F: +44 (0)870 830 1945
E: a.pollock at encoded.co.uk
W: http://www.encoded.co.uk