Hello Emile, thanks for the reply, comments inline.<div><br></div><div>--</div><div>Raphael.<br><br><div class="gmail_quote">On Tue, Oct 25, 2011 at 1:39 AM, Emile Joubert <span dir="ltr">&lt;<a href="mailto:emile@rabbitmq.com">emile@rabbitmq.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Raphael,<br>

<div class="im"><br>

On 25/10/11 02:07, Raphael Simon wrote:<br>

&gt; Hello all,<br>

&gt;<br>

&gt; We are seeing an issue on a production broker where the RabbitMQ process<br>

&gt; writes non stop to files in mnesia/&lt;node&gt;/msg_store_persistent. It keeps<br>

&gt; creating new files and the problem seems to be getting worse. Listing<br>

&gt; the files in that directory shows that it&#39;s creating a new 16 MB file<br>

&gt; every 2 to 4 minutes [1].<br>

&gt;<br>

&gt; The throughput of persistent messages in this broker is orders of<br>

&gt; magnitude less (maybe 20 msg/sec at the most, each being in the 10s of KB) .<br>

<br>

</div><div class="im">&gt; There are about 100 messages sitting in queues on that broker so that<br>

&gt; should not cause that many writes, iostat shows about 6000 writes/s.<br>

<br>

</div>How did you determine this number? Is it constant? I would expect the<br>

behaviour you describe when some queues keep growing or when the broker<br>

needs to free up alot of memory.<br></blockquote><div><br></div><div>We use collectd in combination with rabbitmqctl (not great for performance but allows us to know what&#39;s going on in the brokers at a glance). So these numbers are directly reported from a combination of rabbitmqctl and iostat in this case over the course of days. They are fairly constant.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

If the broker is running rabbit version 2.5.0 or later, could you please<br>

supply the result of &quot;rabbitmqctl report&quot;?<br></blockquote><div><br></div><div>We are running rabbit 2.4.1 </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<br>

If it is an older version please run &quot;rabbitmqctl list_queues&quot; with all<br>

queueinfoitems, and supply both the query and the result. The result of<br>

&quot;erlang:memory().&quot; from the Erlang shell will also be helpful, as well<br>

as a copy of the rabbit configuration file, if you have made any<br>

relevant changes.<br></blockquote><div><br></div><div>So as mentioned above the result of rabbitmqctl list_queues is what we graph/monitor. The box still has plenty of memory (7G free). Here is the output of erlang:memory():</div>

<div><br></div><div><div>(rabbit@broker1-1)1&gt; erlang:memory().</div><div>[{total,3240343872},</div><div> {processes,1558984664},</div><div> {processes_used,1545242416},</div><div> {system,1681359208},</div><div> {atom,1924057},</div>

<div> {atom_used,1908273},</div><div> {binary,1456029912},</div><div> {code,12256962},</div><div> {ets,101781296}]</div></div><div><br></div><div>And here is our config:</div><div><br></div><div>[{rabbit, [{vm_memory_high_watermark, 0.5}]}].</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

Are there any unusual entries in the broker logfile? How often does the<br>

memory alarm trigger? Are there any entries that appear at the onset of<br>

the disk activity?<br></blockquote><div><br></div><div>Nothing unusual in the logs (sasl log is empty, rabbit log just has the usual connection starting / stopping).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<font color="#888888"><br>

<br>

-Emile<br>

<br>

</font></blockquote></div><br></div><div>Something I didn&#39;t mention in my first email is that we have about 8 brokers running in production and only one is showing these symptoms, the throughput is about the same through all brokers.</div>

<div><br></div><div>I&#39;ve dug deeper using an erlang shell and see that two queue processes seem to be causing most of the reductions. Looking at the corresponding variable queue state I see that the target_ram_count of the vqstate record is 0. This is reminiscent of a couple of bugs we had identified with Matthew Sackman that he fixed in 2.4. I&#39;m happy to provide more information on the queue processes state if needed.</div>

<div><br></div><div>--</div><div>Raphael.</div>