[rabbitmq-discuss] Response times for publisher confirms

Tim Watson tim at rabbitmq.com
Tue Oct 16 12:28:33 BST 2012


On 15 Oct 2012, at 16:32, Aaron Pfeifer wrote:

> Hey everyone -
> We've been running some tests on a RabbitMQ stack that we have running in EC2 and encountered some unexpected behaviors in the response times for publisher confirms and wanted to see if folks here had an idea of what the root cause might be.
> First, some details on the stack:
> * EC2 m1.xlarge (15GB memory, 8 EC2 compute units, 64-bit, 1 Gigabit Ethernet)
> * Ubuntu 10.04
> * Erlang R13B03
> * RabbitMQ 2.8.5
> * Mounted to 4 RAID0 ephemeral drives (I believe 400GB each)
> ...and some details on the setup:
> * 16 durable, fanout exchanges
> * 16 durable queues
> * Each exchange mapped to a single queue
> * Each message published as persistent
> We have about 5,000 connections (over many servers) publishing a message once every 15s.  Each connection has publisher confirms enabled.  Each message is approximately 10-15KB in size (give or take).  While running this test we tracked the following data:
> * Maximum amount of time to receive a confirm: http://i.imgur.com/SrXJP.png
> * Average amount of time to receive a confirm: http://i.imgur.com/ryWmr.png
> Looking at this graph seems to indicate that there's something going on in RabbitMQ about every 10 minutes that causes a significant increase in the amount of time it takes to receive a publisher confirm.
> To get some further data about what was happening during each of these spikes, we logged the following data through vmstat and iostat on the machine: https://gist.github.com/3874134.  You'll notice that during this spike we experience a significant decrease in the amount of data being written to disk and a significant increase in the await time on each disk.

I note from vmstat that there are quite a few other processes running - are you sure that there is no contention for the disk between rabbit and another application running on the instance? If another application is performing synchronous I/O then that might account for some of what you're seeing.

Are you publishing and/or expecting confirms in batches? Can you see a constant (flat) throughput in the management UI as well?

> And just a little background: we were planning on using durable queues and publisher confirms to verify that the message was actually received by RabbitMQ.  I believe I've also read that publisher confirms are typically preferred to transactions.

Yes they're generally much faster than transactions. Bare in mind though, that if you're asking for confirms and the messages are persistent, then you're asking rabbit to fsync to disk repeatedly which is very expensive. Whether or not this is related to what you're seeing every 10 minutes is a little unclear to me, though some of the more knowledgeable rabbits may spot something I've missed here.

If you're interested in amortizing the round trip cost, then you could consider dealing with confirms asynchronously as well. But either way, it's probably worth making sure there's not I/O contention from outside sources at regular intervals first. 

More information about the rabbitmq-discuss mailing list