[rabbitmq-discuss] cluster node "stuck" during start

Fri Jul 25 20:45:31 BST 2014

 On 25 July 2014 at 23:37:17, Not Drew Stevens (not.drew.stevens at gmail.com) wrote:
> > When a RabbitMQ cluster node starts back up after a server reboot,  
> we have experienced (more than a few) cases where the RabbitMQ  
> server on the node does not completely start.
>  
> This condition persisted even if the rabbit processes were killed  
> and rabbit manually restarted.

What do you mean by "does not completely start"?

> {file_descriptors,
> [{total_limit,924},{total_used,0},{sockets_limit,829},{sockets_used,0}]},  

This is a really low limit. I can think of one scenario:

 * ulimit -n was set to a high value manually but not via /etc
 * You have over 1000 queues
 * Node rebooted, ulimit -n was reset
 * RabbitMQ tried to recover durable queues and persistent messages and runs out of file descriptors
   in the process

Please bump ulimit -n for the rabbitmq user to 50K and try reproducing the issue.
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ