I've been working with clustering (on a single test machine) and I stopped the 2nd RAM node w/o problem but now the initial disk node seems unresponsive. All of the rabbitmqctl commands hang. Initially, connecting to the node would also hang, but not Rabbit doesn't seem to be listening to the port any more.<div>
<br></div><div><div><br></div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div>$ sudo rabbitmqctl stop_app</div></div><div><div>Stopping node rabbit@bumby2 ...</div>
</div></blockquote><div><div><br></div><div>After a few minutes I abort and "k" for kill and get the following:</div><div><br></div></div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><div><div>^C</div></div></div><div><div><div>BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded</div></div></div><div><div><div> (v)ersion (k)ill (D)b-tables (d)istribution</div></div></div><div><div><br></div>
</div><div><div><div>k </div></div></div><div><div><div><br></div></div></div><div><div><div>Process Information</div></div></div><div><div><div><br></div></div></div><div><div><div>--------------------------------------------------</div>
</div></div><div><div><div>=proc:<0.38.0></div></div></div><div><div><div>State: Waiting</div></div></div><div><div><div>Name: inet_gethost_native</div></div></div><div><div><div>Spawned as: inet_gethost_native:server_init/2</div>
</div></div><div><div><div>Spawned by: <0.37.0></div></div></div><div><div><div>Started: Fri Jan 14 06:38:10 2011</div></div></div><div><div><div>Message queue length: 0</div></div></div><div><div><div>Number of heap fragments: 0</div>
</div></div><div><div><div>Heap fragment data: 0</div></div></div><div><div><div>Link list: [#Port<0.288>, <0.37.0>]</div></div></div><div><div><div>Dictionary: [{rid,1},{num_requests,0}]</div></div></div><div>
<div><div>Reductions: 64</div></div></div><div><div><div>Stack+heap: 233</div></div></div><div><div><div>OldHeap: 0</div></div></div><div><div><div>Heap unused: 190</div></div></div><div><div><div>OldHeap unused: 0</div>
</div>
</div><div><div><div>Stack dump:</div></div></div><div><div><div>Program counter: 0xb771c718 (inet_gethost_native:main_loop/1 + 20)</div></div></div><div><div><div>CP: 0x00000000 (invalid)</div></div></div><div><div><div>
arity = 0</div></div></div><div><div><div><br></div></div></div><div><div><div>0xb74eeecc Return addr 0x08201594 (<terminate process normally>)</div></div></div><div><div><div>y(0) {state,#Port<0.288>,8000,12302,16399,<0.37.0>,4,{statistics,0,0,0,0,0,0,0,0}}</div>
</div></div><div><div><div>(k)ill (n)ext (r)eturn:</div></div></div><div><div><br></div></div></blockquote><div><div><div>Seems to be a lot of these hanging around:</div><div><br></div></div></div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><div><div><br></div></div></div></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div><div>$ ps auxf | grep gethost</div></div></div><div><div><div>
rabbitmq 3700 0.0 0.0 1868 436 ? Ss Jan13 0:00 \_ inet_gethost 4</div></div></div><div><div><div>rabbitmq 3701 0.0 0.0 1916 540 ? S Jan13 0:00 | \_ inet_gethost 4</div></div></div>
<div><div><div>rabbitmq 3995 0.0 0.0 1868 436 ? Ss Jan13 0:00 \_ inet_gethost 4</div></div></div><div><div><div>rabbitmq 3996 0.0 0.0 1916 536 ? S Jan13 0:00 \_ inet_gethost 4</div>
</div></div><div><div><div>rabbitmq 4370 0.0 0.0 1868 432 ? Ss Jan13 0:00 \_ inet_gethost 4</div></div></div><div><div><div>rabbitmq 4371 0.0 0.0 1916 536 ? S Jan13 0:00 \_ inet_gethost 4</div>
</div></div></blockquote><div><div><br></div><div>And logs:</div><div><br></div></div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div><div><div><div>=WARNING REPORT==== 13-Jan-2011::21:39:43 ===</div>
</div></div></div></div><div><div><div><div><div>exception on TCP connection <0.30329.54> from <a href="http://127.0.0.1:50222">127.0.0.1:50222</a></div></div></div></div></div><div><div><div><div><div>connection_closed_abruptly</div>
</div></div></div></div><div><div><div><div><div><br></div></div></div></div></div><div><div><div><div><div>=INFO REPORT==== 13-Jan-2011::21:39:43 ===</div></div></div></div></div><div><div><div><div><div>closing TCP connection <0.30329.54> from <a href="http://127.0.0.1:50222">127.0.0.1:50222</a></div>
</div></div></div></div><div><div><div><div><div><br></div></div></div></div></div><div><div><div><div><div>=ERROR REPORT==== 14-Jan-2011::06:39:16 ===</div></div></div></div></div><div><div><div><div><div>** Node rabbitmqctl7173@bumby2 not responding **</div>
</div></div></div></div><div><div><div><div><div>** Removing (timedout) connection **</div></div></div></div></div></blockquote><div><div><div><div><div><br></div></div></div></div><div><br></div><div>Ubuntu Erlang1:13.b.1-dfsg-2ubuntu1.1 / RabbitMQ 2.2.0-1</div>
<div> </div><div>Are there any docs that describe debugging techniques for those of us that have no Erlang experience? I'm not even sure what to kill -9 if I had to. ;)</div><div><br></div>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>
</div></div>