Hello RabbitMQ users,<div><br></div><div>While load testing my RabbitMQ-based RPC mechanism, I managed to get my RabbitMQ server into some very interesting states when it had a large number of connections. When the AMQP server exceeded ~1020 active connections, it would become unstable and eventually crash in a way that lost some data (including a few persistent messages that had been posted to durable queues).</div>
<div><br></div><div>The cause for the crash wasn't hard to discover: the parent process of RabbitMQ was restricted to 1,024 open file handles (this is apparently the default for the Linux distro I was running), and the resource limit is inherited by child processes. Simply adding a "ulimit -n 65535" to the RabbitMQ init script and a "+P 131072" to the Erlang VM command-line gave the server enough file handles and Erlang processes to handle the load.</div>
<div><br></div><div>What piqued my interest, however, was the catastrophic and data-lossy way in which the server crashed when it reached its limit. Normally, RabbitMQ is very good about avoiding data loss even when it crashes!</div>
<div><br></div><div>Some scrutiny of the logs yielded the following explanation: exhausting the file handles available to the process prevents various fault-tolerance and process control mechanisms from working, including:</div>
<div>-- a "cpu-sup" process that the VM is trying to communicate with via a port</div><div>-- writing the Mnesia tables that hold persisted queue contents</div><div>-- the "rabbitmqctl" process</div><div>
<br></div><div>The result of all these failures taken together is that the server decides to shutdown but can't shutdown cleanly, leading to data loss.</div><div><br></div><div>My ultimate solution is rather crude, but workable: using the iptables conntrack module, I will limit the number of inbound TCP connections to the server and ensure that the server has enough free file handles to take care of "housekeeping" operations.</div>
<div><br></div><div>I thought I'd share my results with the group in case anyone else has encountered this problem, and also query whether anyone else has come up with a different/better solution. Has anyone run into this yet?</div>
<div><br></div><div>Cheers,</div><div> Tony</div><div><br></div><div>P.S. Here are some log excerpts from a RabbitMQ server under heavy load that exemplify the problem:</div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; "><div>
<div><span class="Apple-style-span" style="font-family: 'courier new', monospace; "><br></span></div><div><span class="Apple-style-span" style="font-family: 'courier new', monospace; ">=ERROR REPORT==== 8-Jul-2010::19:23:13 ===</span></div>
</div><div><div><font face="'courier new', monospace">Error in process <0.121.0> on node 'rabbit@TonyS' with exit value: {{badmatch,{error,emfile}},[{cpu_sup,get_uint32_measurement,2},{cpu_sup,measurement_server_loop,1}]}</font></div>
<div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">=ERROR REPORT==== 8-Jul-2010::19:23:18 ===</font></div><div><font face="'courier new', monospace">Error in process <0.5307.0> on node 'rabbit@TonyS' with exit value: {emfile,[{erlang,open_port,[{spawn,"/usr/lib/erlang/lib/os_mon-2.2.5/priv/bin/cpu_sup"},[stream]]},{cpu_sup,start_portprogram,0},{cpu_sup,port_server_init,1}]}</font></div>
</div><div><font face="'courier new', monospace"><br></font></div><div><div><span class="Apple-style-span" style="font-family: 'courier new', monospace; ">=ERROR REPORT==== 8-Jul-2010::19:24:14 ===</span></div>
</div><div><div><font face="'courier new', monospace">Mnesia(rabbit@TonyS): ** ERROR ** (could not write core file: emfile)</font></div><div><font face="'courier new', monospace"> ** FATAL ** Cannot open log file "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_queue.DCL": {file_error,</font></div>
<div><font face="'courier new', monospace"> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_queue.DCL",</font></div>
<div><font face="'courier new', monospace"> emfile}</font></div><div><font face="'courier new', monospace"><br>
</font></div><div><font face="'courier new', monospace">=INFO REPORT==== 8-Jul-2010::19:24:14 ===</font></div><div><font face="'courier new', monospace"> application: mnesia</font></div><div><font face="'courier new', monospace"> exited: shutdown</font></div>
<div><font face="'courier new', monospace"> type: permanent</font></div></div><div><font face="'courier new', monospace"><br></font></div><div><div><div><font color="#CC0000"><b><font face="'courier new', monospace">-------^ someone decides to shutdown everything; tries to do a clean shutdown but can't persist state ^-------</font></b></font></div>
</div></div><div><font face="'courier new', monospace"><br></font></div><div><div><font face="'courier new', monospace">=ERROR REPORT==== 8-Jul-2010::19:24:14 ===</font></div><div><font face="'courier new', monospace">** gen_event handler rabbit_error_logger crashed.</font></div>
<div><font face="'courier new', monospace">** Was installed in error_logger</font></div><div><font face="'courier new', monospace">** Last event was: {error,<0.39.0>,</font></div><div><font face="'courier new', monospace"> {<0.42.0>,</font></div>
<div><font face="'courier new', monospace"> "Mnesia(~p): ** ERROR ** (could not write core file: ~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",</font></div><div><font face="'courier new', monospace"> [rabbit@TonyS,emfile,</font></div>
<div><font face="'courier new', monospace"> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_queue.DCL",</font></div><div><font face="'courier new', monospace"> {file_error,</font></div>
<div><font face="'courier new', monospace"> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_queue.DCL",</font></div><div><font face="'courier new', monospace"> emfile}]}}</font></div>
</div><div><font face="'courier new', monospace"><br></font></div><div><div><div><div><font color="#CC0000"><b><font face="'courier new', monospace">-------^ can't even write a crash dump ^-------</font></b></font></div>
</div></div></div><div><font color="#CC0000"><b><font face="'courier new', monospace"><br></font></b></font></div></span></div>