[rabbitmq-discuss] cluster node "stuck" during start
Michael Klishin
mklishin at gopivotal.com
Fri Jul 25 20:45:31 BST 2014
On 25 July 2014 at 23:37:17, Not Drew Stevens (not.drew.stevens at gmail.com) wrote:
> > When a RabbitMQ cluster node starts back up after a server reboot,
> we have experienced (more than a few) cases where the RabbitMQ
> server on the node does not completely start.
>
> This condition persisted even if the rabbit processes were killed
> and rabbit manually restarted.
What do you mean by "does not completely start"?
> {file_descriptors,
> [{total_limit,924},{total_used,0},{sockets_limit,829},{sockets_used,0}]},
This is a really low limit. I can think of one scenario:
* ulimit -n was set to a high value manually but not via /etc
* You have over 1000 queues
* Node rebooted, ulimit -n was reset
* RabbitMQ tried to recover durable queues and persistent messages and runs out of file descriptors
in the process
Please bump ulimit -n for the rabbitmq user to 50K and try reproducing the issue.
--
MK
Staff Software Engineer, Pivotal/RabbitMQ
More information about the rabbitmq-discuss
mailing list