<div dir="ltr">Hello Simon,<div><br></div><div>Thanks for the response and yeah, my gut feeling went as well toward a leak in that rewrite because in fact this issue started showing up when we upgraded to 3.1.x. We ended up disabling the fine-grained stats last friday as you mentioned in your last suggestion because people were getting paged a bit too often :) The current config is below</div>
<div><br></div><div><div>[</div><div> {rabbit, [</div><div> {vm_memory_high_watermark, 0.75},</div><div> {cluster_nodes, [</div><div> 'rabbit@xyz1',</div><div> 'rabbit@xyz2',</div>
<div> 'rabbit@xyz3'</div><div> ]},</div><div> {collect_statistics, coarse},</div><div> {collect_statistics_interval, 10000}</div><div> ]},</div><div> {rabbitmq_management_agent, [</div>
<div> {force_fine_statistics, false}</div><div> ]}</div><div>].</div></div><div><br></div><div>I will give it a few more days with this config and then maybe<span id="dbph-0"></span> revert to help you figure this issue. A related question, is there a way to programmatically figure which one is the stats node in the cluster? I could not find the config in the HTTP API</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Nov 11, 2013 at 1:52 AM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 11/11/2013 9:45AM, Simon MacMullen wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hmm, the stats DB was more or less rewritten between 3.0.x and 3.1.0 (to<br>
keep stats histories). If there's a memory leak in there I'd very much<br>
like to get to the bottom of it.<br>
</blockquote>
<br></div>
Of course the other possibility is that the stats DB is simply overwhelmed with work and unable to keep up. It's supposed to start dropping incoming stats messages in this situation, but maybe it isn't. To determine if this is the case, look at:<br>
<br>
rabbitmqctl eval 'process_info(global:whereis_<u></u>name(rabbit_mgmt_db), memory).'<br>
<br>
- and if the number that comes back looks like it would account for most of the memory used, then that is likely to be the problem. In that case you can slow down stats event emission by changing collect_statistics_interval (see <a href="http://www.rabbitmq.com/configure.html" target="_blank">http://www.rabbitmq.com/<u></u>configure.html</a>) and / or disable fine-grained stats as I mentioned in the previous message.<div class="HOEnZb">
<div class="h5"><br>
<br>
Cheers, Simon<br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, Pivotal<br>
</div></div></blockquote></div><br></div>