[rabbitmq-discuss] 21673 never recovers from emfile

tsuraan tsuraan at gmail.com
Fri Jan 15 15:57:00 GMT 2010


> Hmmm, interesting find. I don't think that this is related to the file
> descriptor management that's going on in the new persister, but I'll
> certainly test and see if I can reproduce.

Ok, some better test conditions:

Start with a blank rabbit, updated as of this morning (latest
changeset is 2520:4e8508283ca5). Wipe out everything in the
/var/lib/rabbitmq/..., and started rabbit.  Run the following python
program (assuming that amqplib is installed):

from amqplib.client_0_8 import Connection
import sys

def main():
  conns = []
  while True:
    conns.append(Connection('localhost', 'guest', 'guest'))
    sys.stdout.write(".")
    sys.stdout.flush()

if __name__ == "__main__":
  main()

That will run for a while and then hang; I get about 12.5 lines of
output.  At this point, rabbit will print "Erlang has closed", and the
bottom of the node at host.log file will say something like:

=ERROR REPORT==== 15-Jan-2010::09:43:58 ===
** Generic server <0.198.0> terminating
** Last message in was {inet_async,#Port<0.2423>,9030,{ok,#Port<0.9952>}}
** When Server state == none
** Reason for termination ==
** {cannot_accept,{error,emfile}}

At this point, you can never connect again, even if you kill the hog
process that has all of erlang's file descriptors tied up.  Actually,
the last time I ran this test, I also got the following errors:

=ERROR REPORT==== 15-Jan-2010::09:46:08 ===
Mnesia(mqueue at master): ** ERROR ** (core dumped to file:
".../MnesiaCore.mqueue at master_1263_570368_405389")
 ** FATAL ** Cannot open log file ".../rabbit_durable_exchange.DCL":
{file_error,

                       ".../rabbit_durable_exchange.DCL",

                       emfile}

=ERROR REPORT==== 15-Jan-2010::09:46:08 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.92.0>,
                       {<0.95.0>,
                        "Mnesia(~p): ** ERROR ** (core dumped to file:
~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",
                        [mqueue at master,
                         ".../MnesiaCore.mqueue at master_1263_570368_405389",
                         ".../rabbit_durable_exchange.DCL",
                         {file_error,
                             ".../rabbit_durable_exchange.DCL",
                             emfile}]}}
** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {aborted,
                 {no_exists,
                     [rabbit_exchange,
                      {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}]}}

=INFO REPORT==== 15-Jan-2010::09:46:08 ===
    application: mnesia
    exited: shutdown
    type: temporary

So I guess running out of file descriptors is just plain bad.  I think
that the tcp acceptor needs to start rejecting connection attempts one
rabbit only has a dozen or so file descriptors remaining.  It doesn't
look like erlang has a getrlimit function, but there should be some
way to do it.  Anyhow, I hope that helps.




More information about the rabbitmq-discuss mailing list