[rabbitmq-discuss] Client connection timeout

Matthew Sackman matthew at lshift.net
Fri Mar 26 18:18:48 GMT 2010


On Fri, Mar 26, 2010 at 09:33:25AM -0500, Brendan Doyle wrote:
> 1. If the gen_server was started by a supervisor, why would it not come back up and start accepting connections once again?

I would expect it did. However, supervisors have a restart limit in
terms of the number of restarts within a particular period of time. I
would expect that it came back up and then immediately died again with
the same error, cycled like this for a while, and then the supervisor
gave up, exited, and took down the rest of rabbit. If this was the case
you should have entries in your logs about "Reached maximum restart

Or, Erlang could have decided to try and reread the .beam file for the
module (no idea when it decides to do that), hit the same emfile error
and decided to die.

> 2. Given that the client is not closing connections cleanly ( I am looking into this ) but rabbit is successfully detecting that the connection HAS closed, why would we get an out of file descriptors error?  I would hope that file descriptors are released properly and re-used

Should be. lsof or rumaging around in /proc/$PID/fd will show you what
is open and what isn't. You may be able to detect a leak there. Also
netstat -anp | grep beam may be illuminatory.

> This server was running for a few months before this issue cropped up so it may be a 'slow' bug and is not super critical for us as a restart fixed things.  However hopefully someone can shed some light

Sure. Whilst I'm sure I sound like a stuck record these days, this bug
is fixed in the new persister branch, where we track file descriptors
including sockets and stop accepting new network connections when we get
near the limit. The goal here is really to protect the erlang VM so that
we don't die horribly when we run out of file descriptors. My recent
blog post to the lshift blog explains in some detail how the general
mechanism works.



