[rabbitmq-discuss] Lower delivery rate than publish rate - why?

Fri Sep 27 08:49:46 BST 2013

I'm seeing the same issue. Has anyone found out the reason why?

On Sunday, September 1, 2013 1:11:57 PM UTC-7, Tyson Stewart wrote:
>
> I have yet more details to add in case they help.
>
>    - Technically, it's a 3-node cluster, but we took one of the nodes 
>    down last week and have not added it back in because we've had some 
>    problems with RabbitMQ becoming unresponsive when making those kinds of 
>    changes to an active cluster. So we have two reporting nodes and one down 
>    node.
>    - This morning, all 15 consumers maintained 30 messages per second 
>    pretty constantly, but then we hit some delivery threshold (I'm not exactly 
>    sure where), and they started the sawtooth behavior again and has been that 
>    way since. 
>    - We see publish spikes of 2-3x the normal rate every other minute, 
>    but the consumers bounce from 40 messages/second to 0 four to five times 
>    per minute, so it's not a direct correlation between the publish spikes and 
>    delivery drops.
>
>
> On Saturday, August 31, 2013 6:30:24 PM UTC-5, Tyson Stewart wrote:
>>
>> Hello everyone!
>>
>> We've been experiencing some behavior that I don't understand, and none 
>> of my searching or documentation-reading has been fruitful, so I'm here to 
>> ask you all for expert knowledge.
>>
>> Broadly, we're seeing a lower delivery rate than publish rate. I've 
>> attached an image to this message that shows how we're able to keep up when 
>> the publish rate is less than 600 messages/second, but above that, 
>> consumption falls behind publication. Around 16:00 on that chart, we 
>> doubled the number of consumers, and it made no difference that we could 
>> tell. The erratic behavior of the publish rate is us turning off publishes 
>> of the most active queue because we were falling far enough behind that we 
>> became worried. When the backlog would get low enough, we would turn it 
>> back on, and we did that a few times.
>>
>> Here are some vitals to our cluster:
>>
>>    - 2 nodes
>>    - Each node is a m1.xlarge instance hosted in EC2
>>    - We have 133 queues in the cluster (see note below)
>>    - All queues are mirrored (they all use a policy that makes them 
>>    highly available)
>>    - All queues are durable; we use AWS provisioned IOPS to guarantee 
>>    enough throughput
>>    - We only use the direct exchange
>>
>> Regarding the number of queues, there are four kinds: the "main" queues, 
>> retry-a queues, retry-b queues, and poison queues. Messages that fail for 
>> whatever reason during consumption will get put into the retry queues, and 
>> if they fail long enough, they'll wind up in the poison queue where they 
>> will stay until we do something with them manually much later. The main 
>> queues then see the majority of activity.
>>
>> The average message size is less than 1MB. At nearly one million 
>> messages, we were still under 1GB of memory usage, and our high watermark 
>> is 5.9GB. 
>>
>> Disk IOPS don't appear to be the problem. Metrics indicated we still had 
>> plenty of headroom. Furthermore, if IOPS were the limitation, I would have 
>> expected the delivery rate to increase as the publish rate decreased while 
>> the consumers worked through the queue. It did not, however, as shown on 
>> the chart.
>>
>> My question primarily is: *What do you think is limiting our consumption 
>> rate?* I'm curious about what affects consumption rate in general, 
>> though. Any advice would be appreciated at this point. Questions for 
>> clarification are also welcome!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130927/8caeab8e/attachment.htm>