[rabbitmq-discuss] RabbitMQ hangs, does not accept connections

Matthias Radestock matthias at rabbitmq.com
Fri Dec 30 12:11:30 GMT 2011


On 30/12/11 08:27, Dmitri Minaev wrote:
> I would be most grateful if you could have a look at our server.

I have done this now.

Rabbit indeed wasn't accepting connections - it wasn't refusing them 
either, i.e. it was behaving as if 'accept' hadn't been called.

The Erlang process tasked with accepting AMQP connections was alive and 
well. It was simply sitting there waiting for the tcp subsystem to 
notify it of new connections. Alas that never happened.

So either the acceptor process forgot to tell Erlang's tcp stack to be 
notified of new connections, or Erlang's tcp stack forgot that it was 
supposed to tell the acceptor process...

...and it looks like there is a path in the tcp_acceptor.erl that would 
trigger the former. When the tcp stack notifies the acceptor of an error 
other than 'closed', the acceptor carries on but does not invoke 
prim_inet:async_accept/2 to be notified of the next connection attempt.

I will file a bug for this. Should be easy to fix, though we cannot be 
certain that this is definitely the problem.

Obviously if this was happening frequently we would have heard about the 
issue a long time ago - the code in question hasn't changed for >3 
years. So there must be some rare circumstances triggering this.

I got the acceptor process to issue another async_accept, so rabbit is 
happy for the moment. But no doubt the problem will re-occur.



More information about the rabbitmq-discuss mailing list