[rabbitmq-discuss] Durability and consumer acknowledgement extremely slow
Karl Rieb
karl.rieb at gmail.com
Wed Apr 24 17:26:42 BST 2013
Hi,
I am trying to improve the message throughput for a RabbitMQ queue in an
Amazon cloud instance and am noticing a *significant* drop in performance
when enabling acknowledgements for consumer of a durable queue (with
persisted messages). The real problem is that the bottleneck appears to be
on the rabbit node and not with the consumers, so adding more consumers
does not improve the throughput (or help drain the queue any quicker). As
a matter of fact, adding new consumers will just slow down existing
consumers so everyone ends up consuming at a slower rate, preventing
overall throughput from changing.
Trying to do batch acknowledgements using the Multiple flag helps a bit (8k
msgs/s vs 5.5k msgs/s) but not much compared to the initial drop. It is
only when I turn on *auto_ack* for the consumers that I see the performance
shoot *way *back up and when I start seeing a linear increase in throughput
as I add more consumers.
Is this expected behavior? Is there a way to configure the rabbit node so
it doesn't hit this bottleneck with acknowledgements?
Here is the sample code I'm using to test the throughput:
Publisher:
#!/usr/bin/python
import pika
creds = pika.PlainCredentials('guest','guest')
conn =
pika.BlockingConnection(pika.ConnectionParameters(host='10.10.1.123',
credentials=creds))
chan = conn.channel()
while True:
chan.basic_publish(exchange='simple_exchange',
routing_key='simple_queue', body='',
properties=pika.BasicProperties(delivery_mode=2))
Consumer:
#!/usr/bin/python
import pika
def callback(chan, method, properties, body):
chan.basic_ack(delivery_tag=method.delivery_tag, multiple=False)
creds = pika.PlainCredentials('guest','guest')
conn =
pika.BlockingConnection(pika.ConnectionParameters(host='10.10.1.123',
credentials=creds))
chan = conn.channel()
chan.basic_consume(callback, queue='simple_queue', no_ack=False)
chan.basic_qos(prefetch_count=1000)
chan.start_consuming()
I spawn multiple processes for the producers and multiple for the consumer
(so there is no python interpreter locking issues since each runs in its
own interpreter instance). I'm using an an Amazon *c1.xlarge *(8 virtual
cores and "high" IO) Ubuntu 12.04 LTS instance with RabbitMQ version
3.0.4-1 and an Amazon ephemeral disk (in production we would use an EBS
volume instead). The queue is marked *Durable* and my messages all use *
delivery_mode* 2 (persist).
Below are the performance numbers. For each test I use 2 publishers
processes and 6 consumer processes (where 3 different machines host 2
consumers each). The producers and consumers are all on *separate*machines from the rabbit node. Throughput measurements were done using the
RabbitMQ management UI and linux utility top. Python was compiled to pyc
files before running.
*no_ack = True:*
rate = 24,000/s
single consumer CPU = 65%
single publisher CPU = 80% (flow control enabled and being enforced)
(beam.smp) rabbit CPU = 400% (of 800%, 8 cores) -> 0.0%wa 11.5%sy
*no_ack = False (manual acks per message):*
rate = 5,500/s
single consumer CPU = 20%
single publisher CPU = 20% (flow control enabled and being enforced)
(beam.smp) rabbit CPU = 300% (of 800%, 8 cores) -> 4.5%wa 10.0%sy
The most notable difference besides the throughput are the I/O waits when
ACKs are enabled (4.5% vs 0.0%). This leads me to believe that the rabbit
node is being bottlenecked by performing I/O operations for ACK
bookkeeping. The I/O doesn't appear to be a problem for persisting the
published messages since I'm *guessing* that rabbit is buffering those and
syncing them to disk in batches. Does this mean the acknowledgements are
not also being buffered before synced with disk? Can I configure the
rabbit node to change this behavior to help speed up the acknowledgements?
I'm not using transactions in the example code above, so I don't need any
strict guarantees that ACKs were written to disk before returning.
Thanks,
Karl
P.S. I wrote the same sample consumer code in Ruby to see if there was a
difference (in case there was a Python issue), but the numbers were about
the same.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130424/9f2a468c/attachment.htm>
More information about the rabbitmq-discuss
mailing list