[rabbitmq-discuss] 21673 never recovers from emfile

tsuraan tsuraan at gmail.com
Fri Jan 15 15:57:00 GMT 2010

> Hmmm, interesting find. I don't think that this is related to the file
> descriptor management that's going on in the new persister, but I'll
> certainly test and see if I can reproduce.

Ok, some better test conditions:

Start with a blank rabbit, updated as of this morning (latest
changeset is 2520:4e8508283ca5). Wipe out everything in the
/var/lib/rabbitmq/..., and started rabbit.  Run the following python
program (assuming that amqplib is installed):

from amqplib.client_0_8 import Connection
import sys

def main():
  conns = []
  while True:
    conns.append(Connection('localhost', 'guest', 'guest'))

if __name__ == "__main__":

That will run for a while and then hang; I get about 12.5 lines of
output.  At this point, rabbit will print "Erlang has closed", and the
bottom of the node at host.log file will say something like:

=ERROR REPORT==== 15-Jan-2010::09:43:58 ===
** Generic server <0.198.0> terminating
** Last message in was {inet_async,#Port<0.2423>,9030,{ok,#Port<0.9952>}}
** When Server state == none
** Reason for termination ==
** {cannot_accept,{error,emfile}}

At this point, you can never connect again, even if you kill the hog
process that has all of erlang's file descriptors tied up.  Actually,
the last time I ran this test, I also got the following errors:

=ERROR REPORT==== 15-Jan-2010::09:46:08 ===
Mnesia(mqueue at master): ** ERROR ** (core dumped to file:
".../MnesiaCore.mqueue at master_1263_570368_405389")
 ** FATAL ** Cannot open log file ".../rabbit_durable_exchange.DCL":



=ERROR REPORT==== 15-Jan-2010::09:46:08 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.92.0>,
                        "Mnesia(~p): ** ERROR ** (core dumped to file:
~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",
                        [mqueue at master,
                         ".../MnesiaCore.mqueue at master_1263_570368_405389",
** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {aborted,

=INFO REPORT==== 15-Jan-2010::09:46:08 ===
    application: mnesia
    exited: shutdown
    type: temporary

So I guess running out of file descriptors is just plain bad.  I think
that the tcp acceptor needs to start rejecting connection attempts one
rabbit only has a dozen or so file descriptors remaining.  It doesn't
look like erlang has a getrlimit function, but there should be some
way to do it.  Anyhow, I hope that helps.

More information about the rabbitmq-discuss mailing list