[rabbitmq-discuss] Someone else with a nodedown error

Eric Berg ehberg at gmail.com
Wed May 8 18:22:12 BST 2013


I have ready through many of these nodedown error emails and of course none
of them seem to be exactly what I am experiencing.

I have a 4 node cluster, and one of the nodes went offline according to the
cluster. This box has the following in the sasl log:

=SUPERVISOR REPORT==== 7-May-2013::14:37:22 ===
     Supervisor: {<0.11197.1096>,
                                           rabbit_channel_sup_sup}
     Context:    shutdown_error
     Reason:     noproc
     Offender:   [{pid,<0.11199.1096>},
                  {name,channel_sup},
                  {mfa,{rabbit_channel_sup,start_link,[]}},
                  {restart_type,temporary},
                  {shutdown,infinity},
                  {child_type,supervisor}]

*Yet in the regular rabbit log i can see that it was still accepting
connections up until 2:22AM the next day:*

(last log entry)
=INFO REPORT==== 8-May-2013::02:22:26 ===
closing AMQP connection <0.18267.1145> (IPADDRESS:PORT -> IPADDRESS:PORT)

*Running rabbitmqctl status returns:*

[root at rabbit-box rabbitmq]# rabbitmqctl status
Status of node 'rabbit at rabbit-box' ...
Error: unable to connect to node 'rabbit at rabbit-box': nodedown

DIAGNOSTICS
===========

nodes in question: ['rabbit at rabbit-box']

hosts, their running nodes and ports:
- rabbit-box: [{rabbit,13957},{rabbitmqctl2301,16508}]

current node details:
- node name: 'rabbitmqctl2301 at rabbit-box'
- home dir: /var/lib/rabbitmq
- cookie hash: qQwyFW90ZNbbrFvX1AtrxQ==


A couple of notes:
- Looking for a process run by rabbit show that it appears to still be
running
- Erlang cookie is the same on all nodes of the cluster, the cookie hash is
the same as well
- A traffic spike occurred right around the time of the last entry in the
rabbit log
- I can find no other errors in any logs that relate to rabbit or erlang
- Up until this point the cluster has been running fine for over 40 days.
- telnet IP_ADDRESS 5672 times out
- I have not restarted the box, erlang node, or entire rabbitmq-server

Is there anywhere else I can go looking for errors? I am about to start
killing processs, but Im not sure that will solve anything.

Thanks!

Eric Berg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130508/4e72aa13/attachment.htm>


More information about the rabbitmq-discuss mailing list