FWIW, I ended up kill -9 anything that looked like a rabbit and renamed the mnesia directory.<div><br></div><div>When I first restarted (with the old mnesia db) I could run a status on node "rabbit" and it would say:</div>
<div><br></div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div><div> {nodes,[{disc,[rabbit@bumby2]},{ram,[bumby22@bumby2]}]},</div></div></div><div><div><div>
{running_nodes,[rabbit@bumby2]}]</div></div></div></blockquote><div><div><br></div><div>I ran stop_app and reset on that node. I should have tried force_reset, which I had forgot about.</div><div><br></div></div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><div><div>$ sudo rabbitmqctl start_app -n bumby22</div></div></div><div><div><div>Starting node bumby22@bumby2 ...</div></div></div><div><div><div>Error: unable to connect to node bumby22@bumby2: nodedown</div></div>
</div><div><div><div>diagnostics:</div></div></div><div><div><div>- nodes and their ports on bumby2: [{rabbit,38534},{rabbitmqctl4333,36621}]</div></div></div><div><div><div>- current node: rabbitmqctl4333@bumby2</div></div>
</div><div><div><div>- current node home dir: /var/lib/rabbitmq</div></div></div><div><div><div>- current node cookie hash: dtrZjBnJVTn9JkBFcRVBBA==</div></div></div><div><br></div></blockquote><div><div><div>bumby22 and bumby22@bumby2 (ya, bad name) are in /etc/hosts.</div>
<div><br></div><div>It would be nice to know how to un-wedge when needed -- and why rabbitmqctl was hanging. /etc/init.d/rabbitmq stop even hung.</div><div><br></div><div><br></div><div class="gmail_quote">On Fri, Jan 14, 2011 at 7:01 AM, Bill Moseley <span dir="ltr"><<a href="mailto:moseley@hank.org">moseley@hank.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">I've been working with clustering (on a single test machine) and I stopped the 2nd RAM node w/o problem but now the initial disk node seems unresponsive. All of the rabbitmqctl commands hang. Initially, connecting to the node would also hang, but not Rabbit doesn't seem to be listening to the port any more.<div>
<br></div><div><div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div>$ sudo rabbitmqctl stop_app</div></div><div><div>Stopping node rabbit@bumby2 ...</div>
</div></blockquote><div><div><br></div><div>After a few minutes I abort and "k" for kill and get the following:</div><div><br></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px">
<div><div><div>^C</div></div></div><div><div><div>BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded</div></div></div><div><div><div> (v)ersion (k)ill (D)b-tables (d)istribution</div></div></div><div><div><br></div>
</div><div><div><div>k </div></div></div><div><div><div><br></div></div></div><div><div><div>Process Information</div></div></div><div><div><div><br></div></div></div><div><div><div>--------------------------------------------------</div>
</div></div><div><div><div>=proc:<0.38.0></div></div></div><div><div><div>State: Waiting</div></div></div><div><div><div>Name: inet_gethost_native</div></div></div><div><div><div>Spawned as: inet_gethost_native:server_init/2</div>
</div></div><div><div><div>Spawned by: <0.37.0></div></div></div><div><div><div>Started: Fri Jan 14 06:38:10 2011</div></div></div><div><div><div>Message queue length: 0</div></div></div><div><div><div>Number of heap fragments: 0</div>
</div></div><div><div><div>Heap fragment data: 0</div></div></div><div><div><div>Link list: [#Port<0.288>, <0.37.0>]</div></div></div><div><div><div>Dictionary: [{rid,1},{num_requests,0}]</div></div></div><div>
<div><div>Reductions: 64</div></div></div><div><div><div>Stack+heap: 233</div></div></div><div><div><div>OldHeap: 0</div></div></div><div><div><div>Heap unused: 190</div></div></div><div><div><div>OldHeap unused: 0</div>
</div>
</div><div><div><div>Stack dump:</div></div></div><div><div><div>Program counter: 0xb771c718 (inet_gethost_native:main_loop/1 + 20)</div></div></div><div><div><div>CP: 0x00000000 (invalid)</div></div></div><div><div><div>
arity = 0</div></div></div><div><div><div><br></div></div></div><div><div><div>0xb74eeecc Return addr 0x08201594 (<terminate process normally>)</div></div></div><div><div><div>y(0) {state,#Port<0.288>,8000,12302,16399,<0.37.0>,4,{statistics,0,0,0,0,0,0,0,0}}</div>
</div></div><div><div><div>(k)ill (n)ext (r)eturn:</div></div></div><div><div><br></div></div></blockquote><div><div><div>Seems to be a lot of these hanging around:</div><div><br></div></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px">
<div><div><div><br></div></div></div></blockquote><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div><div>$ ps auxf | grep gethost</div></div></div><div><div><div>
rabbitmq 3700 0.0 0.0 1868 436 ? Ss Jan13 0:00 \_ inet_gethost 4</div></div></div><div><div><div>rabbitmq 3701 0.0 0.0 1916 540 ? S Jan13 0:00 | \_ inet_gethost 4</div></div></div>
<div><div><div>rabbitmq 3995 0.0 0.0 1868 436 ? Ss Jan13 0:00 \_ inet_gethost 4</div></div></div><div><div><div>rabbitmq 3996 0.0 0.0 1916 536 ? S Jan13 0:00 \_ inet_gethost 4</div>
</div></div><div><div><div>rabbitmq 4370 0.0 0.0 1868 432 ? Ss Jan13 0:00 \_ inet_gethost 4</div></div></div><div><div><div>rabbitmq 4371 0.0 0.0 1916 536 ? S Jan13 0:00 \_ inet_gethost 4</div>
</div></div></blockquote><div><div><br></div><div>And logs:</div><div><br></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div><div><div><div>=WARNING REPORT==== 13-Jan-2011::21:39:43 ===</div>
</div></div></div></div><div><div><div><div><div>exception on TCP connection <0.30329.54> from <a href="http://127.0.0.1:50222" target="_blank">127.0.0.1:50222</a></div></div></div></div></div><div><div><div><div><div>
connection_closed_abruptly</div>
</div></div></div></div><div><div><div><div><div><br></div></div></div></div></div><div><div><div><div><div>=INFO REPORT==== 13-Jan-2011::21:39:43 ===</div></div></div></div></div><div><div><div><div><div>closing TCP connection <0.30329.54> from <a href="http://127.0.0.1:50222" target="_blank">127.0.0.1:50222</a></div>
</div></div></div></div><div><div><div><div><div><br></div></div></div></div></div><div><div><div><div><div>=ERROR REPORT==== 14-Jan-2011::06:39:16 ===</div></div></div></div></div><div><div><div><div><div>** Node rabbitmqctl7173@bumby2 not responding **</div>
</div></div></div></div><div><div><div><div><div>** Removing (timedout) connection **</div></div></div></div></div></blockquote><div><div><div><div><div><br></div></div></div></div><div><br></div><div>Ubuntu Erlang1:13.b.1-dfsg-2ubuntu1.1 / RabbitMQ 2.2.0-1</div>
<div> </div><div>Are there any docs that describe debugging techniques for those of us that have no Erlang experience? I'm not even sure what to kill -9 if I had to. ;)</div><div><br></div>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>
</div></div>
</blockquote></div><br><br clear="all"><br>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>
</div></div>