[rabbitmq-discuss] rabbitmq dying

Ben Hood 0x6e6562 at gmail.com
Wed Jun 18 15:49:12 BST 2008


Dave,

I've had a quick look at the log file, and it seems that your clients
are dying, which in turn is badly handled in the broker.

This error

Error in process <0.7321.2> on node 'rabbit at vsdlbblue01' with exit
value: {badarg,[{erlang,port_command,[#Port<0.27825>,[<<7 bytes>>,<<36
bytes>>,<<1 byte>>]]},{rabbit_writer,internal_send_command_async,3},{rabbit_writer,handle_message,2},{rabbit_writer,mainloop,1}]}


suggests that the opposing peer is no longer there and causes a follow
on error message:

Error in process <0.30205.1> on node 'rabbit at vsdlbblue01' with exit
value: {{badmatch,{error,[{exit,{timeout,{gen_server,call,[<0.30206.1>,{notify_down,<0.30204.1>}]}}}]}},[{rabbit_channel,terminate,2},{buffering_proxy,mainloop,4}]}

which is a symptom of the first error not being handled correctly.

There is a bug for this already and will be fixed very soon, please
let us know what the urgency on this is, because we could get a patch
out quicker if necessary.

This is not the complete answer though, which we'll look into, but I
just wanted to give some feedback as soon as possible.

A few questions to help us diagnose this:

- What version of Rabbit are you using?
- Does the Rabbit process actually die or just the TCP listener?

Thanks,

Ben

On Wed, Jun 18, 2008 at 3:23 PM,  <David.Corcoran at edftrading.com> wrote:
>
> Hi,
>
> Recently rabbitmq has been dying on us and I think I've found the problem.
> What usually happens is that the clients timeout and disconnect (they have
> a 3 second heartbeat) and reconnecting doesn't work. We get a
> "java.net.ConnectException: Connection refused" exception. The 'beam' task
> is also currently using 6% CPU and about 2GB of RAM.
>
> The errors look like:
> Mnesia(rabbit at vsdlbblue01): ** WARNING ** Mnesia is overloaded: {dump_log,
>
> time_threshold}
>
> Mnesia(rabbit at vsdlbblue01): ** WARNING ** Mnesia is overloaded: {mnesia_tm,
>
> message_queue_len,
>
> [705,850]}
>
> error on TCP connection from 10.80.12.26:47327
> {timeout,{frame_payload,3,1,29421}}
>
> etc.
>
> It looks like we might be leaving messages lying around? If I'm correct is
> there a way of seeing the queues and which have lots of messages? I've
> attached the last few hours of the log file in case that helps.
>
> Thanks,
>
> Dave
>
> (See attached file: rabbit.zip)
>
> *********************************************************************
> This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
> EDF Trading Limited
> 80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
> A Company registered in England No. 4255974.
> Switchboard: 020 7061 4000
> EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
> VAT number: GB 735 5479 07
> *********************************************************************
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>




More information about the rabbitmq-discuss mailing list