[rabbitmq-discuss] Client connection timeout

Brendan Doyle brendan.doyle at epicadvertising.com
Fri Mar 26 14:33:25 GMT 2010


Hey all.  Had an issue this morning with a client ( producer + consumer ) unable to connect to the server.

- 1.7.0 on R13B02
- ulimit -n = 1024
- rabbitmq process was up and not using very much CPU/memory
- rabbitmqctl could connect and report queue status
- a number of consumers that run as daemons were still connected, receiving messages, and processing normally

The particular client that could not connect runs as a cron and periodically checks for new messages on a queue.

In the sasl log there are a large number of entries like this block:

=INFO REPORT==== 26-Mar-2010::06:18:02 ===
starting TCP connection <0.22422.218> from 10.10.8.105:46037

=INFO REPORT==== 26-Mar-2010::06:18:03 ===
accepted TCP connection on 0.0.0.0:5672 from 10.10.8.105:46038

=WARNING REPORT==== 26-Mar-2010::06:18:04 ===
exception on TCP connection <0.22422.218> from 10.10.8.105:46037
connection_closed_abruptly

Which seems to be the client connecting and not disconnecting cleanly.  Finally followed by:

=ERROR REPORT==== 26-Mar-2010::06:18:24 ===
** Generic server <0.169.0> terminating
** Last message in was {inet_async,#Port<0.973>,15606,{ok,#Port<0.23580885>}}
** When Server state == none
** Reason for termination ==
** {cannot_accept,{error,emfile}}

It was at this time that the server started refusing connections.

So it appears we ran out of file descriptors which crashed the gen_server owning the listening socket.  My questions are:

1. If the gen_server was started by a supervisor, why would it not come back up and start accepting connections once again?
2. Given that the client is not closing connections cleanly ( I am looking into this ) but rabbit is successfully detecting that the connection HAS closed, why would we get an out of file descriptors error?  I would hope that file descriptors are released properly and re-used

This server was running for a few months before this issue cropped up so it may be a 'slow' bug and is not super critical for us as a restart fixed things.  However hopefully someone can shed some light

Brendan Doyle 
Manager, Application Development 
Epic Advertising - New York, Toronto, San Francisco, London 
www.EpicAdvertising.com 
60 Columbia Way, Suite 310 
Markham, ON L3R 0C9 
(905) 946-0300 x.2358 work 
(647) 885-7159 mobile
(888) 666-3120 fax 
brendan.doyle at epicadvertising.com





More information about the rabbitmq-discuss mailing list