[rabbitmq-discuss] 21673 never recovers from emfile
tsuraan
tsuraan at gmail.com
Fri Jan 15 15:57:00 GMT 2010
> Hmmm, interesting find. I don't think that this is related to the file
> descriptor management that's going on in the new persister, but I'll
> certainly test and see if I can reproduce.
Ok, some better test conditions:
Start with a blank rabbit, updated as of this morning (latest
changeset is 2520:4e8508283ca5). Wipe out everything in the
/var/lib/rabbitmq/..., and started rabbit. Run the following python
program (assuming that amqplib is installed):
from amqplib.client_0_8 import Connection
import sys
def main():
conns = []
while True:
conns.append(Connection('localhost', 'guest', 'guest'))
sys.stdout.write(".")
sys.stdout.flush()
if __name__ == "__main__":
main()
That will run for a while and then hang; I get about 12.5 lines of
output. At this point, rabbit will print "Erlang has closed", and the
bottom of the node at host.log file will say something like:
=ERROR REPORT==== 15-Jan-2010::09:43:58 ===
** Generic server <0.198.0> terminating
** Last message in was {inet_async,#Port<0.2423>,9030,{ok,#Port<0.9952>}}
** When Server state == none
** Reason for termination ==
** {cannot_accept,{error,emfile}}
At this point, you can never connect again, even if you kill the hog
process that has all of erlang's file descriptors tied up. Actually,
the last time I ran this test, I also got the following errors:
=ERROR REPORT==== 15-Jan-2010::09:46:08 ===
Mnesia(mqueue at master): ** ERROR ** (core dumped to file:
".../MnesiaCore.mqueue at master_1263_570368_405389")
** FATAL ** Cannot open log file ".../rabbit_durable_exchange.DCL":
{file_error,
".../rabbit_durable_exchange.DCL",
emfile}
=ERROR REPORT==== 15-Jan-2010::09:46:08 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.92.0>,
{<0.95.0>,
"Mnesia(~p): ** ERROR ** (core dumped to file:
~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",
[mqueue at master,
".../MnesiaCore.mqueue at master_1263_570368_405389",
".../rabbit_durable_exchange.DCL",
{file_error,
".../rabbit_durable_exchange.DCL",
emfile}]}}
** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {aborted,
{no_exists,
[rabbit_exchange,
{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}]}}
=INFO REPORT==== 15-Jan-2010::09:46:08 ===
application: mnesia
exited: shutdown
type: temporary
So I guess running out of file descriptors is just plain bad. I think
that the tcp acceptor needs to start rejecting connection attempts one
rabbit only has a dozen or so file descriptors remaining. It doesn't
look like erlang has a getrlimit function, but there should be some
way to do it. Anyhow, I hope that helps.
More information about the rabbitmq-discuss
mailing list