[rabbitmq-discuss] Almost all statistics drop to 0 every hour

Irmo Manie irmo.manie at gmail.com
Tue Nov 13 10:35:15 GMT 2012


I'm still seeing this behavior and it seems to get worse.
Every hour there is a drop in delivery rate to 0 and now also the publish
rate drops to 0.
Sometimes even for a period of 5 minutes

I'm really sure that messaging continues and the statistics are a lie.
We use rabbitmq as our main communication hub and certainly a drop in
real-time market data going from ~500 msg/s to 0 for about 5 minutes must
be noticed somewhere if this is true.

I'm wondering if it might be the writing part of the statistics to the
database. Could that timeout?

I don't really want to restart to see if that fixes the problem.
It's been running for 177 days now and apart from the weird statistics we
don't have any problems.

Is there anything I could do as additional checks to figure out where the
problem might be?

- Irmo


On Tue, Nov 6, 2012 at 12:24 PM, Simon MacMullen <simon at rabbitmq.com> wrote:

> On 06/11/12 08:48, Irmo Manie wrote:
>
>> As you can see, the delivery (no-ack) rate drops to 0 every hour for a
>> couple of minutes. Other statistics on message processing indicate that
>> actual delivery/consuming is continuing, so it looks like it's really
>> only the statistics that are wrong.
>>
>> If I look at the json output during such a 'downtime' I can see that the
>> publish rate is always available, but the rest of the rates
>> (acknowledge, delivery, etc) are mostly 0.
>>
>
> Just the rates in /api/overview or the ones in /api/queues as well?
>
>
>  I understand a bit from the sources that in this /api/overview call from
>> the management api, these statistics are gathered from different places,
>> both memory and database (with additional calculations executed on them
>> while fetching), correct?
>>
>
> Yes. We get basic queue information from the Mnesia database, and then
> augment it with statistics information (including rates) from the
> (in-memory) management database.
>
>
>  Could it be that some of these values are 0 because of certain time-outs
>> while getting the data?
>>
>
> No, the requests for stats don't time out.
>
>
>  In other words: do I have to start searching for
>> the problem at the database disk/IO level?
>>
>
> No.
>
>
>  I couldn't see any IO waits on the machine indicating something else is
>> happening at the time. Also CPU load is normal.
>> Still the weird thing is that this also happens on a test machine at
>> roughly the same times.
>> Since both machines are VM's, this might indicate that it could be an
>> infrastructural problem, but I'd like to be sure before accusing 'others'.
>>
>
> If /api/overview and /api/queues contradict each other I'd like to know
> about it.
>
> But if the rates drop to 0 in /api/queues as well, Occam's Razor suggests
> that maybe your consumers are pausing for some reason :)
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121113/651b0953/attachment.htm>


More information about the rabbitmq-discuss mailing list