[rabbitmq-discuss] Lower delivery rate than publish rate - why?

Wed Sep 4 19:21:35 BST 2013

I would also like to hear the answer to this question, because my situation 
is almost identical. And we thought it's a network problem, and about to 
move to the bigger instance type, just for the test.

On Saturday, August 31, 2013 7:30:24 PM UTC-4, Tyson Stewart wrote:
>
> Hello everyone!
>
> We've been experiencing some behavior that I don't understand, and none of 
> my searching or documentation-reading has been fruitful, so I'm here to ask 
> you all for expert knowledge.
>
> Broadly, we're seeing a lower delivery rate than publish rate. I've 
> attached an image to this message that shows how we're able to keep up when 
> the publish rate is less than 600 messages/second, but above that, 
> consumption falls behind publication. Around 16:00 on that chart, we 
> doubled the number of consumers, and it made no difference that we could 
> tell. The erratic behavior of the publish rate is us turning off publishes 
> of the most active queue because we were falling far enough behind that we 
> became worried. When the backlog would get low enough, we would turn it 
> back on, and we did that a few times.
>
> Here are some vitals to our cluster:
>
>    - 2 nodes
>    - Each node is a m1.xlarge instance hosted in EC2
>    - We have 133 queues in the cluster (see note below)
>    - All queues are mirrored (they all use a policy that makes them 
>    highly available)
>    - All queues are durable; we use AWS provisioned IOPS to guarantee 
>    enough throughput
>    - We only use the direct exchange
>
> Regarding the number of queues, there are four kinds: the "main" queues, 
> retry-a queues, retry-b queues, and poison queues. Messages that fail for 
> whatever reason during consumption will get put into the retry queues, and 
> if they fail long enough, they'll wind up in the poison queue where they 
> will stay until we do something with them manually much later. The main 
> queues then see the majority of activity.
>
> The average message size is less than 1MB. At nearly one million messages, 
> we were still under 1GB of memory usage, and our high watermark is 5.9GB. 
>
> Disk IOPS don't appear to be the problem. Metrics indicated we still had 
> plenty of headroom. Furthermore, if IOPS were the limitation, I would have 
> expected the delivery rate to increase as the publish rate decreased while 
> the consumers worked through the queue. It did not, however, as shown on 
> the chart.
>
> My question primarily is: *What do you think is limiting our consumption 
> rate?* I'm curious about what affects consumption rate in general, 
> though. Any advice would be appreciated at this point. Questions for 
> clarification are also welcome!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130904/4d4b13f9/attachment-0001.htm>