[rabbitmq-discuss] Node hung and not responding to rabbitmqctl
Bill Moseley
moseley at hank.org
Fri Jan 14 23:29:56 GMT 2011
FWIW, I ended up kill -9 anything that looked like a rabbit and renamed the
mnesia directory.
When I first restarted (with the old mnesia db) I could run a status on node
"rabbit" and it would say:
{nodes,[{disc,[rabbit at bumby2]},{ram,[bumby22 at bumby2]}]},
{running_nodes,[rabbit at bumby2]}]
I ran stop_app and reset on that node. I should have tried force_reset,
which I had forgot about.
$ sudo rabbitmqctl start_app -n bumby22
Starting node bumby22 at bumby2 ...
Error: unable to connect to node bumby22 at bumby2: nodedown
diagnostics:
- nodes and their ports on bumby2: [{rabbit,38534},{rabbitmqctl4333,36621}]
- current node: rabbitmqctl4333 at bumby2
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: dtrZjBnJVTn9JkBFcRVBBA==
bumby22 and bumby22 at bumby2 (ya, bad name) are in /etc/hosts.
It would be nice to know how to un-wedge when needed -- and why rabbitmqctl
was hanging. /etc/init.d/rabbitmq stop even hung.
On Fri, Jan 14, 2011 at 7:01 AM, Bill Moseley <moseley at hank.org> wrote:
> I've been working with clustering (on a single test machine) and I stopped
> the 2nd RAM node w/o problem but now the initial disk node seems
> unresponsive. All of the rabbitmqctl commands hang. Initially, connecting
> to the node would also hang, but not Rabbit doesn't seem to be listening to
> the port any more.
>
>
> $ sudo rabbitmqctl stop_app
> Stopping node rabbit at bumby2 ...
>
>
> After a few minutes I abort and "k" for kill and get the following:
>
> ^C
> BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
> (v)ersion (k)ill (D)b-tables (d)istribution
>
> k
>
> Process Information
>
> --------------------------------------------------
> =proc:<0.38.0>
> State: Waiting
> Name: inet_gethost_native
> Spawned as: inet_gethost_native:server_init/2
> Spawned by: <0.37.0>
> Started: Fri Jan 14 06:38:10 2011
> Message queue length: 0
> Number of heap fragments: 0
> Heap fragment data: 0
> Link list: [#Port<0.288>, <0.37.0>]
> Dictionary: [{rid,1},{num_requests,0}]
> Reductions: 64
> Stack+heap: 233
> OldHeap: 0
> Heap unused: 190
> OldHeap unused: 0
> Stack dump:
> Program counter: 0xb771c718 (inet_gethost_native:main_loop/1 + 20)
> CP: 0x00000000 (invalid)
> arity = 0
>
> 0xb74eeecc Return addr 0x08201594 (<terminate process normally>)
> y(0)
> {state,#Port<0.288>,8000,12302,16399,<0.37.0>,4,{statistics,0,0,0,0,0,0,0,0}}
> (k)ill (n)ext (r)eturn:
>
> Seems to be a lot of these hanging around:
>
>
> $ ps auxf | grep gethost
> rabbitmq 3700 0.0 0.0 1868 436 ? Ss Jan13 0:00 \_
> inet_gethost 4
> rabbitmq 3701 0.0 0.0 1916 540 ? S Jan13 0:00 | \_
> inet_gethost 4
> rabbitmq 3995 0.0 0.0 1868 436 ? Ss Jan13 0:00 \_
> inet_gethost 4
> rabbitmq 3996 0.0 0.0 1916 536 ? S Jan13 0:00 \_
> inet_gethost 4
> rabbitmq 4370 0.0 0.0 1868 432 ? Ss Jan13 0:00 \_
> inet_gethost 4
> rabbitmq 4371 0.0 0.0 1916 536 ? S Jan13 0:00 \_
> inet_gethost 4
>
>
> And logs:
>
> =WARNING REPORT==== 13-Jan-2011::21:39:43 ===
> exception on TCP connection <0.30329.54> from 127.0.0.1:50222
> connection_closed_abruptly
>
> =INFO REPORT==== 13-Jan-2011::21:39:43 ===
> closing TCP connection <0.30329.54> from 127.0.0.1:50222
>
> =ERROR REPORT==== 14-Jan-2011::06:39:16 ===
> ** Node rabbitmqctl7173 at bumby2 not responding **
> ** Removing (timedout) connection **
>
>
>
> Ubuntu Erlang1:13.b.1-dfsg-2ubuntu1.1 / RabbitMQ 2.2.0-1
>
> Are there any docs that describe debugging techniques for those of us that
> have no Erlang experience? I'm not even sure what to kill -9 if I had to.
> ;)
>
> --
> Bill Moseley
> moseley at hank.org
>
--
Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110114/73c15550/attachment.htm>
More information about the rabbitmq-discuss
mailing list