[rabbitmq-discuss] Lower delivery rate than publish rate - why?

Sun Sep 1 00:30:24 BST 2013

Hello everyone!

We've been experiencing some behavior that I don't understand, and none of 
my searching or documentation-reading has been fruitful, so I'm here to ask 
you all for expert knowledge.

Broadly, we're seeing a lower delivery rate than publish rate. I've 
attached an image to this message that shows how we're able to keep up when 
the publish rate is less than 600 messages/second, but above that, 
consumption falls behind publication. Around 16:00 on that chart, we 
doubled the number of consumers, and it made no difference that we could 
tell. The erratic behavior of the publish rate is us turning off publishes 
of the most active queue because we were falling far enough behind that we 
became worried. When the backlog would get low enough, we would turn it 
back on, and we did that a few times.

Here are some vitals to our cluster:

   - 2 nodes
   - Each node is a m1.xlarge instance hosted in EC2
   - We have 133 queues in the cluster (see note below)
   - All queues are mirrored (they all use a policy that makes them highly 
   available)
   - All queues are durable; we use AWS provisioned IOPS to guarantee 
   enough throughput
   - We only use the direct exchange

Regarding the number of queues, there are four kinds: the "main" queues, 
retry-a queues, retry-b queues, and poison queues. Messages that fail for 
whatever reason during consumption will get put into the retry queues, and 
if they fail long enough, they'll wind up in the poison queue where they 
will stay until we do something with them manually much later. The main 
queues then see the majority of activity.

The average message size is less than 1MB. At nearly one million messages, 
we were still under 1GB of memory usage, and our high watermark is 5.9GB. 

Disk IOPS don't appear to be the problem. Metrics indicated we still had 
plenty of headroom. Furthermore, if IOPS were the limitation, I would have 
expected the delivery rate to increase as the publish rate decreased while 
the consumers worked through the queue. It did not, however, as shown on 
the chart.

My question primarily is: *What do you think is limiting our consumption 
rate?* I'm curious about what affects consumption rate in general, though. 
Any advice would be appreciated at this point. Questions for clarification 
are also welcome!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130831/2347f7e6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RabbitMQ_Management.png
Type: image/png
Size: 37799 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130831/2347f7e6/attachment.png>