[rabbitmq-discuss] Unexplained shutdown of RabbitMQ

Jason Zaugg jzaugg at gmail.com
Fri May 14 12:44:43 BST 2010


=System Details:=

Erlang .572
RabbitMQ 1.6.0
Windows Server 2003 64Bit
8 core machine.
Lots of free disk space.

=Application Details=

The RabbitMQ Broker and the message consumers/publishers are running
on the same machine. All the queues in this application are
non-persistent. There are some other very low traffic applications
with small persistent queues on the same broker.

Five Java processes are running on this machine consuming and
publishing messages through one VHost and one Exchange. There are
desktop applications

I've sent the architecture diagram separately to info at rabbitmq.com,
but I can't share it with this list right now. I can send by private
email if required.

The applications and broker have been running smoothly in production
for the last few weeks.

=Symptoms=

Today, a problem is occuring. After 5-60 minutes of uptime, the
clients report that the broker is shutting down, then the Erlang
process starts spinning on one CPU. I suspect it is writing the large
error log during this time. We killed the erl.exe, and restarted.

The shutdown produces this logging. For more context, see the attachment.

  =ERROR REPORT==== 14-May-2010::10:27:29 ===
  ** Generic server <0.1962.0> terminating
  ** Last message in was {'EXIT',<0.1960.0>,{writer,send_failed,badarg}}
  ** When Server state == {ch,running,1,<0.1957.0>,<0.1960.0>,undefined,none,
  [snip reams of output with the queued messages]
  ** Reason for termination ==
  ** {writer,send_failed,badarg}

=Questions=

1. Can we configure RabbitMQ to suppress logging of the message queue
when this error occurs. The log files are growing to 4GB+, and it's
tricky to follow them.
2. What might "writer,send_failed,badarg" as the termination reason
suggest as the root cause?
3. Prior the the shutdown, what is the meaning of:

  exception on TCP connection <0.2021.0> from 10.30.33.169:3251
  {timeout,running}
  exception on TCP connection <0.1707.0> from 10.30.32.44:2692
  connection_closed_abruptly

4. When I try to run rabbitmqctl, it fails with:

  C:\Documents and Settings\sophis_server>rabbitmqctl list_connections
  Listing connections ...
  Error: {badrpc,nodedown}

and in the rabbit.log

  =ERROR REPORT==== 14-May-2010::10:36:28 ===
  ** Connection attempt from disallowed node rabbitmqctl at CHGVASSOPP105 **


Let me know if I can provide more information, and thanks in advance
for your help!

Jason Zaugg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit-restart-loop.log
Type: application/octet-stream
Size: 13531 bytes
Desc: not available
Url : http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20100514/79b55dfe/attachment.obj 


More information about the rabbitmq-discuss mailing list