[rabbitmq-discuss] Durability and consumer acknowledgement extremely slow

Simon MacMullen
Thu Apr 25 11:59:34 BST 2013

Hi Karl. I suspect you are not really seeing a bottleneck with 
acknowledgements, but rather an optimisation in autoack mode. When you 
publish a persistent message to an empty queue with a non-blocked 
autoack consumer RabbitMQ will not persist the message to disc - there's 
no point. The message can go straight to the consumer, and then it's 
gone; it can never be requeued.

So I suspect that's the difference you're seeing. And I'm afraid 5-8k 
msg/s is roughly what I would expect for persistent messages on a 
reasonable machine.

Cheers, Simon

On 24/04/13 17:26, Karl Rieb wrote:
> Hi,
> I am trying to improve the message throughput for a RabbitMQ queue in an
> Amazon cloud instance and am noticing a *significant* drop in
> performance when enabling acknowledgements for consumer of a durable
> queue (with persisted messages).  The real problem is that the
> bottleneck appears to be on the rabbit node and not with the consumers,
> so adding more consumers does not improve the throughput (or help drain
> the queue any quicker).  As a matter of fact, adding new consumers will
> just slow down existing consumers so everyone ends up consuming at a
> slower rate, preventing overall throughput from changing.
> Trying to do batch acknowledgements using the Multiple flag helps a bit
> (8k msgs/s vs 5.5k msgs/s) but not much compared to the initial drop.
>   It is only when I turn on *auto_ack* for the consumers that I see the
> performance shoot *way *back up and when I start seeing a linear
> increase in throughput as I add more consumers.
> Is this expected behavior?  Is there a way to configure the rabbit node
> so it doesn't hit this bottleneck with acknowledgements?
> Here is the sample code I'm using to test the throughput:
> Publisher:
>     #!/usr/bin/python
>     import pika
>     creds = pika.PlainCredentials('guest','guest')
>     conn  =
>     pika.BlockingConnection(pika.ConnectionParameters(host='', credentials=creds))
>     chan  = conn.channel()
>     while True:
>     chan.basic_publish(exchange='simple_exchange',
>     routing_key='simple_queue', body='',
>     properties=pika.BasicProperties(delivery_mode=2))
> Consumer:
>       #!/usr/bin/python
>     import pika
>     def callback(chan, method, properties, body):
>          chan.basic_ack(delivery_tag=method.delivery_tag, multiple=False)
>     creds = pika.PlainCredentials('guest','guest')
>     conn  =
>     pika.BlockingConnection(pika.ConnectionParameters(host='', credentials=creds))
>     chan  = conn.channel()
>     chan.basic_consume(callback, queue='simple_queue', no_ack=False)
>     chan.basic_qos(prefetch_count=1000)
>     chan.start_consuming()
> I spawn multiple processes for the producers and multiple for the
> consumer (so there is no python interpreter locking issues since each
> runs in its own interpreter instance).  I'm using an an Amazon
> *c1.xlarge *(8 virtual cores and "high" IO) Ubuntu 12.04 LTS instance
> with RabbitMQ version 3.0.4-1 and an Amazon ephemeral disk (in
> production we would use an EBS volume instead).  The queue is marked
> *Durable* and my messages all use *delivery_mode* 2 (persist).
> Below are the performance numbers.  For each test I use 2 publishers
> processes and 6 consumer processes (where 3 different machines host 2
> consumers each).  The producers and consumers are all on *separate*
> machines from the rabbit node.  Throughput measurements were done using
> the RabbitMQ management UI and linux utility top.  Python was compiled
> to pyc files before running.
> *no_ack = True:*
>      rate = 24,000/s
>      single consumer CPU   =  65%
>      single publisher CPU  =  80% (flow control enabled and being enforced)
>      (beam.smp) rabbit CPU = 400% (of 800%, 8 cores) -> 0.0%wa 11.5%sy
> *no_ack = False (manual acks per message):*
>      rate =  5,500/s
>      single consumer CPU   =  20%
>      single publisher CPU  =  20% (flow control enabled and being enforced)
>      (beam.smp) rabbit CPU = 300% (of 800%, 8 cores) -> 4.5%wa 10.0%sy
> The most notable difference besides the throughput are the I/O waits
> when ACKs are enabled (4.5% vs 0.0%).  This leads me to believe that the
> rabbit node is being bottlenecked by performing I/O operations for ACK
> bookkeeping.  The I/O doesn't appear to be a problem for persisting the
> published messages since I'm *guessing* that rabbit is buffering those
> and syncing them to disk in batches.  Does this mean the
> acknowledgements are not also being buffered before synced with disk?
>   Can I configure the rabbit node to change this behavior to help speed
> up the acknowledgements?   I'm not using transactions in the example
> code above, so I don't need any strict guarantees that ACKs were written
> to disk before returning.
> Thanks,
> Karl
> P.S. I wrote the same sample consumer code in Ruby to see if there was a
> difference (in case there was a Python issue), but the numbers were
> about the same.
Simon MacMullen
RabbitMQ, VMware

