Hello Emile, thanks for the reply, comments inline.<div><br></div><div>--</div><div>Raphael.<br><br><div class="gmail_quote">On Tue, Oct 25, 2011 at 1:39 AM, Emile Joubert <span dir="ltr"><<a href="mailto:emile@rabbitmq.com">emile@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Raphael,<br>
<div class="im"><br>
On 25/10/11 02:07, Raphael Simon wrote:<br>
> Hello all,<br>
><br>
> We are seeing an issue on a production broker where the RabbitMQ process<br>
> writes non stop to files in mnesia/<node>/msg_store_persistent. It keeps<br>
> creating new files and the problem seems to be getting worse. Listing<br>
> the files in that directory shows that it's creating a new 16 MB file<br>
> every 2 to 4 minutes [1].<br>
><br>
> The throughput of persistent messages in this broker is orders of<br>
> magnitude less (maybe 20 msg/sec at the most, each being in the 10s of KB) .<br>
<br>
</div><div class="im">> There are about 100 messages sitting in queues on that broker so that<br>
> should not cause that many writes, iostat shows about 6000 writes/s.<br>
<br>
</div>How did you determine this number? Is it constant? I would expect the<br>
behaviour you describe when some queues keep growing or when the broker<br>
needs to free up alot of memory.<br></blockquote><div><br></div><div>We use collectd in combination with rabbitmqctl (not great for performance but allows us to know what's going on in the brokers at a glance). So these numbers are directly reported from a combination of rabbitmqctl and iostat in this case over the course of days. They are fairly constant.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
If the broker is running rabbit version 2.5.0 or later, could you please<br>
supply the result of "rabbitmqctl report"?<br></blockquote><div><br></div><div>We are running rabbit 2.4.1 </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
If it is an older version please run "rabbitmqctl list_queues" with all<br>
queueinfoitems, and supply both the query and the result. The result of<br>
"erlang:memory()." from the Erlang shell will also be helpful, as well<br>
as a copy of the rabbit configuration file, if you have made any<br>
relevant changes.<br></blockquote><div><br></div><div>So as mentioned above the result of rabbitmqctl list_queues is what we graph/monitor. The box still has plenty of memory (7G free). Here is the output of erlang:memory():</div>
<div><br></div><div><div>(rabbit@broker1-1)1> erlang:memory().</div><div>[{total,3240343872},</div><div> {processes,1558984664},</div><div> {processes_used,1545242416},</div><div> {system,1681359208},</div><div> {atom,1924057},</div>
<div> {atom_used,1908273},</div><div> {binary,1456029912},</div><div> {code,12256962},</div><div> {ets,101781296}]</div></div><div><br></div><div>And here is our config:</div><div><br></div><div>[{rabbit, [{vm_memory_high_watermark, 0.5}]}].</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Are there any unusual entries in the broker logfile? How often does the<br>
memory alarm trigger? Are there any entries that appear at the onset of<br>
the disk activity?<br></blockquote><div><br></div><div>Nothing unusual in the logs (sasl log is empty, rabbit log just has the usual connection starting / stopping).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<font color="#888888"><br>
<br>
-Emile<br>
<br>
</font></blockquote></div><br></div><div>Something I didn't mention in my first email is that we have about 8 brokers running in production and only one is showing these symptoms, the throughput is about the same through all brokers.</div>
<div><br></div><div>I've dug deeper using an erlang shell and see that two queue processes seem to be causing most of the reductions. Looking at the corresponding variable queue state I see that the target_ram_count of the vqstate record is 0. This is reminiscent of a couple of bugs we had identified with Matthew Sackman that he fixed in 2.4. I'm happy to provide more information on the queue processes state if needed.</div>
<div><br></div><div>--</div><div>Raphael.</div>