[rabbitmq-discuss] Stats node getting slow
Simon MacMullen
simon at rabbitmq.com
Mon Nov 18 11:23:31 GMT 2013
Hi Pierpaulo, thanks for that. So we can see from those outputs that the
vast majority of memory used is process memory rather than ETS. So it
looks like the management database is simply failing to keep up with
inbound messages. This is slightly disappointing because it contains an
emergency mechanism that is supposed to drop messages when it's getting
overwhelmed. So I'd like to find out more about your workload.
That's what I was hoping to get from "rabbitmqctl report". If you can't
send that to me direct then I'd like to know how many connections,
channels and queues you have, as well as the approximate churn rate for
each of those, and the degree of fanout (i.e. for each channel, how many
queues does it publish to / consume from?)
Also: have you tried increasing collect_statistics_interval? I suspect
that would improve things for you.
Cheers, Simon
On 16/11/2013 04:24, Pierpaolo Baccichet wrote:
> Hello Simon, as promised we did an experiment today reverting to the old
> config that enables fine grained stats and got the slowdown reproduced
> very quickly. I dumped the output of the commands you asked us to run in
> the attached files:
>
> log_rabbit_1.txt: rabbitmqctl eval '[{T, [KV || KV = {K, _} <- I,
> lists:member(K, [size, memory])]} || T <- ets:all(), I <- [ets:info(T)],
> proplists:get_value(name, I) =:= rabbit_mgmt_db].'
>
> log_rabbit2.txt: rabbitmqctl eval
> 'process_info(global:whereis___name(rabbit_mgmt_db), memory).'
>
> I have also the output of the rabbitmqctl report command but it contains
> a lot of information that is leaking internal stuff so I can't really
> forward it as a whole. Is there something specific you'd like to see
> from it?
>
>
> On Mon, Nov 11, 2013 at 6:24 AM, Pierpaolo Baccichet
> <pierpaolo at dropbox.com <mailto:pierpaolo at dropbox.com>> wrote:
>
> Hello Simon,
>
> Thanks for the response and yeah, my gut feeling went as well toward
> a leak in that rewrite because in fact this issue started showing up
> when we upgraded to 3.1.x. We ended up disabling the fine-grained
> stats last friday as you mentioned in your last suggestion because
> people were getting paged a bit too often :) The current config is below
>
> [
> {rabbit, [
> {vm_memory_high_watermark, 0.75},
> {cluster_nodes, [
> 'rabbit at xyz1',
> 'rabbit at xyz2',
> 'rabbit at xyz3'
> ]},
> {collect_statistics, coarse},
> {collect_statistics_interval, 10000}
> ]},
> {rabbitmq_management_agent, [
> {force_fine_statistics, false}
> ]}
> ].
>
> I will give it a few more days with this config and then mayberevert
> to help you figure this issue. A related question, is there a way to
> programmatically figure which one is the stats node in the cluster?
> I could not find the config in the HTTP API
>
>
> On Mon, Nov 11, 2013 at 1:52 AM, Simon MacMullen <simon at rabbitmq.com
> <mailto:simon at rabbitmq.com>> wrote:
>
> On 11/11/2013 9:45AM, Simon MacMullen wrote:
>
> Hmm, the stats DB was more or less rewritten between 3.0.x
> and 3.1.0 (to
> keep stats histories). If there's a memory leak in there I'd
> very much
> like to get to the bottom of it.
>
>
> Of course the other possibility is that the stats DB is simply
> overwhelmed with work and unable to keep up. It's supposed to
> start dropping incoming stats messages in this situation, but
> maybe it isn't. To determine if this is the case, look at:
>
> rabbitmqctl eval
> 'process_info(global:whereis___name(rabbit_mgmt_db), memory).'
>
> - and if the number that comes back looks like it would account
> for most of the memory used, then that is likely to be the
> problem. In that case you can slow down stats event emission by
> changing collect_statistics_interval (see
> http://www.rabbitmq.com/__configure.html
> <http://www.rabbitmq.com/configure.html>) and / or disable
> fine-grained stats as I mentioned in the previous message.
>
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, Pivotal
>
>
>
More information about the rabbitmq-discuss
mailing list