[rabbitmq-discuss] Client connection timeout

Fri Mar 26 18:34:08 GMT 2010

Thanks for the answers Matthew, couple of comments inline

On 2010-03-26, at 2:18 PM, Matthew Sackman wrote:

Howdy,

On Fri, Mar 26, 2010 at 09:33:25AM -0500, Brendan Doyle wrote:
1. If the gen_server was started by a supervisor, why would it not come back up and start accepting connections once again?

I would expect it did. However, supervisors have a restart limit in
terms of the number of restarts within a particular period of time. I
would expect that it came back up and then immediately died again with
the same error, cycled like this for a while, and then the supervisor
gave up, exited, and took down the rest of rabbit. If this was the case
you should have entries in your logs about "Reached maximum restart
intensity".

Or, Erlang could have decided to try and reread the .beam file for the
module (no idea when it decides to do that), hit the same emfile error
and decided to die.

I did look for the restart intensity message originally because that's what I would have expected to see, but the emfile error was the first and only error message other than connection open/close and persister logs rolling.  So it was probably scenario #2 you describe.  I just would have expected that to take down the whole VM if the child spec was defined with restart type permanent

2. Given that the client is not closing connections cleanly ( I am looking into this ) but rabbit is successfully detecting that the connection HAS closed, why would we get an out of file descriptors error?  I would hope that file descriptors are released properly and re-used

Should be. lsof or rumaging around in /proc/$PID/fd will show you what
is open and what isn't. You may be able to detect a leak there. Also
netstat -anp | grep beam may be illuminatory.

This server was running for a few months before this issue cropped up so it may be a 'slow' bug and is not super critical for us as a restart fixed things.  However hopefully someone can shed some light

Sure. Whilst I'm sure I sound like a stuck record these days, this bug
is fixed in the new persister branch, where we track file descriptors
including sockets and stop accepting new network connections when we get
near the limit. The goal here is really to protect the erlang VM so that
we don't die horribly when we run out of file descriptors. My recent
blog post to the lshift blog explains in some detail how the general
mechanism works.

http://www.lshift.net/blog/2010/03/23/the-fine-art-of-holding-a-file-descriptor

Thanks for the info, I have seen this response from you about quite a few issues ;) but that's the way things go

Good blog post that I hadn't seen yet

Matthew

Brendan Doyle
Manager, Application Development
Epic Advertising - New York, Toronto, San Francisco, London
www.EpicAdvertising.com<http://www.EpicAdvertising.com>
60 Columbia Way, Suite 310
Markham, ON L3R 0C9
(905) 946-0300 x.2358 work
(647) 885-7159 mobile
(888) 666-3120 fax
brendan.doyle at epicadvertising.com<mailto:brendan.doyle at epicadvertising.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20100326/c1fdbf63/attachment.htm