[rabbitmq-discuss] Lower delivery rate than publish rate - why?
Zhibo Wei
uglytrollx at gmail.com
Fri Sep 27 08:49:46 BST 2013
I'm seeing the same issue. Has anyone found out the reason why?
On Sunday, September 1, 2013 1:11:57 PM UTC-7, Tyson Stewart wrote:
>
> I have yet more details to add in case they help.
>
> - Technically, it's a 3-node cluster, but we took one of the nodes
> down last week and have not added it back in because we've had some
> problems with RabbitMQ becoming unresponsive when making those kinds of
> changes to an active cluster. So we have two reporting nodes and one down
> node.
> - This morning, all 15 consumers maintained 30 messages per second
> pretty constantly, but then we hit some delivery threshold (I'm not exactly
> sure where), and they started the sawtooth behavior again and has been that
> way since.
> - We see publish spikes of 2-3x the normal rate every other minute,
> but the consumers bounce from 40 messages/second to 0 four to five times
> per minute, so it's not a direct correlation between the publish spikes and
> delivery drops.
>
>
> On Saturday, August 31, 2013 6:30:24 PM UTC-5, Tyson Stewart wrote:
>>
>> Hello everyone!
>>
>> We've been experiencing some behavior that I don't understand, and none
>> of my searching or documentation-reading has been fruitful, so I'm here to
>> ask you all for expert knowledge.
>>
>> Broadly, we're seeing a lower delivery rate than publish rate. I've
>> attached an image to this message that shows how we're able to keep up when
>> the publish rate is less than 600 messages/second, but above that,
>> consumption falls behind publication. Around 16:00 on that chart, we
>> doubled the number of consumers, and it made no difference that we could
>> tell. The erratic behavior of the publish rate is us turning off publishes
>> of the most active queue because we were falling far enough behind that we
>> became worried. When the backlog would get low enough, we would turn it
>> back on, and we did that a few times.
>>
>> Here are some vitals to our cluster:
>>
>> - 2 nodes
>> - Each node is a m1.xlarge instance hosted in EC2
>> - We have 133 queues in the cluster (see note below)
>> - All queues are mirrored (they all use a policy that makes them
>> highly available)
>> - All queues are durable; we use AWS provisioned IOPS to guarantee
>> enough throughput
>> - We only use the direct exchange
>>
>> Regarding the number of queues, there are four kinds: the "main" queues,
>> retry-a queues, retry-b queues, and poison queues. Messages that fail for
>> whatever reason during consumption will get put into the retry queues, and
>> if they fail long enough, they'll wind up in the poison queue where they
>> will stay until we do something with them manually much later. The main
>> queues then see the majority of activity.
>>
>> The average message size is less than 1MB. At nearly one million
>> messages, we were still under 1GB of memory usage, and our high watermark
>> is 5.9GB.
>>
>> Disk IOPS don't appear to be the problem. Metrics indicated we still had
>> plenty of headroom. Furthermore, if IOPS were the limitation, I would have
>> expected the delivery rate to increase as the publish rate decreased while
>> the consumers worked through the queue. It did not, however, as shown on
>> the chart.
>>
>> My question primarily is: *What do you think is limiting our consumption
>> rate?* I'm curious about what affects consumption rate in general,
>> though. Any advice would be appreciated at this point. Questions for
>> clarification are also welcome!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130927/8caeab8e/attachment.htm>
More information about the rabbitmq-discuss
mailing list