<div dir="ltr"><div>SO now the fun part.  I decided to try and rebuild the middle node (I have boxes 10, 11 and 12).  However, I can't get the middle node to reconnect to the cluster.  Removing it's mnesia directory allowed it to start, but it can't rejoin the cluster.  SO I tried removing the node from the cluster, e.g.:<br>
<br>rabbitmqctl -n cluster@rabbitmqm10 forget_cluster_node cluster@rabbitmqm11<br><br></div>But the above never responds - it's just sitting there hanging.  <br><br>rabbitmqctl -n cluster@rabbitmqm11 status FROM the other nodes all works fine. I'm about at a loss as to how the heck to repair things.  I can't remove the node from the cluster, I can't start it with the mnesia directory in it's current state, and removing the mnesia directory and trying to add it back in is failing - it fails with "....done (already_member).".  Trying to do rabbitmqctl update_cluster_nodes cluster@rabbitmqm10 is sitting there doing nothing and not responding either.<br>
<div><br><br></div><div> I'm starting to really worry I'm going to have to completely rebuild my cluster...<br>Jason<br><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Apr 10, 2014 at 2:55 PM, Jason McIntosh <span dir="ltr"><<a href="mailto:mcintoshj@gmail.com" target="_blank">mcintoshj@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Not sure what's going on here.  Just ugpraded my cluster from 3.2.3 to 3.2.4 (including a restart of the machine).  On startup, two of my initial nodes started fine, but when the third node in the cluster started, the "/etc/init.d/rabbitmq-server start" just sits at "Starting rabbitmq-server: " without ever finishing.  Doing a rabbitmqctl status shows:<br>

Status of node cluster@rabbitmqm11p ...<br>[{pid,62505},<br> {running_applications,[{os_mon,"CPO  CXC 138 46","2.2.14"},<br>                        {inets,"INETS  CXC 138 49","5.9.8"},<br>

                        {mnesia,"MNESIA  CXC 138 12","4.11"},<br>                        {amqp_client,"RabbitMQ AMQP Client","3.2.4"},<br>                        {xmerl,"XML parser","1.3.6"},<br>

                        {eldap,"Ldap api","1.0.2"},<br>                        {sasl,"SASL  CXC 138 11","2.3.4"},<br>                        {stdlib,"ERTS  CXC 138 10","1.19.4"},<br>

                        {kernel,"ERTS  CXC 138 10","2.16.4"}]},<br> {os,{unix,linux}},<br> {erlang_version,"Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:24:24] [async-threads:30] [hipe] [kernel-poll:true]\n"},<br>

 {memory,[{total,48504352},<br>          {connection_procs,2808},<br>          {queue_procs,0},<br>          {plugins,0},<br>          {other_proc,16290632},<br>          {mnesia,1783536},<br>          {mgmt_db,0},<br>          {msg_index,0},<br>

          {other_ets,1120896},<br>          {binary,725448},<br>          {code,19691642},<br>          {atom,703377},<br>          {other_system,8186013}]},<br> {file_descriptors,[{total_limit,12188},<br>                    {total_used,0},<br>

                    {sockets_limit,10967},<br>                    {sockets_used,0}]},<br> {processes,[{limit,1048576},{used,117}]},<br> {run_queue,0},<br> {uptime,83}]<br>...done.<br><br><br></div>In the web management interface, I see this:<br>

Node statistics not available<br><h2>Memory details</h2>

  
<div>

  <div style="width:0px" title="Connections 2.7kB">
  </div>

  <div style="width:0px" title="Queues 0B">
  </div>

  <div style="width:0px" title="Plugins 0B">
  </div>

  <div style="width:269px" title="Other process memory 16MB">
  </div>

  <div style="width:29px" title="Mnesia 1.7MB">
  </div>

  <div style="width:0px" title="Message store index 0B">
  </div>

  <div style="width:0px" title="Management database 0B">
  </div>

  <div style="width:18px" title="Other ETS tables 1.1MB">
  </div>

  <div style="width:12px" title="Binaries 708kB">
  </div>

  <div style="width:325px" title="Code 19MB">
  </div>

  <div style="width:12px" title="Atoms 687kB">
  </div>

  <div style="width:134px" title="Other system 7.8MB">
  </div>

</div>
<span> </span>
<div>
<table>
  <tbody><tr>
    <th>Connections</th>
    <td>2.7kB</td>
  </tr>
  <tr>
    <th>Queues</th>
    <td>0B</td>
  </tr>
  <tr>
    <th>Plugins</th>
    <td>0B</td>
  </tr>
  <tr>
    <th>Other process memory</th>
    <td>16MB</td>
  </tr>
</tbody></table>
<table>
  <tbody><tr>
    <th>Mnesia</th>
    <td>1.7MB</td>
  </tr>
  <tr>
    <th>Message store index</th>
    <td>0B</td>
  </tr>
  <tr>
    <th>Management database</th>
    <td>0B</td>
  </tr>
  <tr>
    <th>Other ETS tables</th>
    <td>1.1MB</td>
  </tr>
</tbody></table>
<table>
  <tbody><tr>
    <th>Binaries</th>
    <td>708kB</td>
  </tr>
  <tr>
    <th>Code</th>
    <td>19MB</td>
  </tr>
  <tr>
    <th>Atoms</th>
    <td>687kB</td>
  </tr>
  <tr>
    <th>Other system</th>
    <td>7.8MB</td>
  </tr>
</tbody></table>
</div><br><br></div>SO rabbit appears to have sort of started, but certain things are not started (e.g. plugins).  Plugins list is:<br>[e] amqp_client                       3.2.4<br>[ ] cowboy                            0.5.0-rmq3.2.4-git4b93c2d<br>

[ ] eldap                             3.2.4-gite309de4<br>[e] mochiweb                          2.7.0-rmq3.2.4-git680dba8<br>[ ] rabbitmq_amqp1_0                  3.2.4<br>[E] rabbitmq_auth_backend_ldap        3.2.4<br>[ ] rabbitmq_auth_mechanism_ssl       3.2.4<br>

[E] rabbitmq_consistent_hash_exchange 3.2.4<br>[E] rabbitmq_federation               3.2.4<br>[E] rabbitmq_federation_management    3.2.4<br>[ ] rabbitmq_jsonrpc                  3.2.4<br>[ ] rabbitmq_jsonrpc_channel          3.2.4<br>

[ ] rabbitmq_jsonrpc_channel_examples 3.2.4<br>[E] rabbitmq_management               3.2.4<br>[E] rabbitmq_management_agent         3.2.4<br>[E] rabbitmq_management_visualiser    3.2.4<br>[ ] rabbitmq_mqtt                     3.2.4<br>

[E] rabbitmq_shovel                   3.2.4<br>[E] rabbitmq_shovel_management        3.2.4<br>[ ] rabbitmq_stomp                    3.2.4<br>[ ] rabbitmq_tracing                  3.2.4<br>[e] rabbitmq_web_dispatch             3.2.4<br>

[ ] rabbitmq_web_stomp                3.2.4<br>[ ] rabbitmq_web_stomp_examples       3.2.4<br>[ ] rfc4627_jsonrpc                   3.2.4-git5e67120<br>[ ] sockjs                            0.3.4-rmq3.2.4-git3132eb9<br>[e] webmachine                        1.10.3-rmq3.2.4-gite9359c7<br>

<br><br>Any suggestions on next steps on debugging this?  Or what I can do to get this back up and in a "healthy" state?<br><br>Thanks!<span class="HOEnZb"><font color="#888888"><br>Jason<br><div><br><br><div><br clear="all">
<div><br>-- <br><div dir="ltr">
Jason McIntosh<br><a href="https://github.com/jasonmcintosh/" target="_blank">https://github.com/jasonmcintosh/</a><br><a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612</a></div>
</div></div></div></font></span></div>
</blockquote></div><br><br clear="all"><br>-- <br><div dir="ltr">Jason McIntosh<br><a href="https://github.com/jasonmcintosh/" target="_blank">https://github.com/jasonmcintosh/</a><br>573-424-7612</div>
</div>