[rabbitmq-discuss] Looking for guidance on R14B04 vs. R13B03 performance
simon at rabbitmq.com
Wed Feb 15 17:36:41 GMT 2012
On 15/02/12 17:15, Matt Pietrek wrote:
> Thanks Simon. The 5% figure is useful for me.
Cheers. Bear in mind that's just my guess, and it's also dependent on
your situation being CPU-bound - which it sounds like you aren't.
> Let me give you a more precise description of what I'm doing to get the
> 36 message/sec.
> * RabbitMQ 2.71 on a 3-node cluster with mirrored queues, durable on
> all nodes.
> * Client is Python 2.6/Pika 0.9.5.
> * Each message publish occurs in a transaction so that we can be
> sure it's safely in RabbitMQ.
> * All nodes are Ubuntu 10.04 VMs with 4GB RAM and 2 or 4 vCPUs.
> At the heart of things, we're driving a highly complex state machine
> that manages thousands of VMs and their associated state. Losing track
> of any state is prohibitively expensive to clean up manually. As such,
> all state is modeled in clustered databases and/or persistent messages
> in the message queue. We have to assume that a given client app instance
> (our management code) may be ungracefully terminated at any moment, so
> enough state must be modeled to let a new instance pick up and recover.
> If our database record indicates that a message has been sent, it better
> darn well be in the hands of the RabbitMQ broker, and not sitting in
> some Pika client-side queue.
> For this reasons, publisher-confirms are not particularly helpful - They
> assume that the client app will be around to resend the message if the
> message doesn't get confirmed. Similar story for batching messages. We
> have to know they've been sent, and
> we can't stall our state machine
> waiting for enough message to accumulate to publish multiple messages at
So I think this is really the key point. OK, I accept your use case :)
> My goal in my latest round of experiments is to see what the maximum
> throughput of a highly available system is in optimal circumstances.
> We're perfectly willing to spend the money on high end SSDs and
> networking equipment as necessary.
> To prototype what this perf level is, I've configured RabbitMQ with the
> MNESIA directory pointing to a ramdisk (/tmp).
I assume that this is not how you plan to go into production, you are
just testing how fast you can go if disk speed is not an issue?
> I've configured all the
> VMs with VMXNET3 networking, am with 16K blocks, are seeing bandwidth of
> 130MB/sec between VMs in the cluster.
The real issue though is going to be latency from client to server. What
does that look like?
> My test app simply writes 6 byte message, one at a time, as quickly as
> it can. In monitoring the cluster nodes, I'm seeing very low CPU usage,
> very few writes to the physical disk, and network operation rates of
> about 700/sec for the master node and 350/sec for the client node.
> In short, there's a bottleneck somewhere and it's not obvious where.
> I'll try your suggestion about replacing tx.commit. Any other insight or
> guidance would of course be very much appreciated. :-)
It sounds like you have already almost eliminated disk writes in
practice, so my working assumption would be that it's client -> server
latency that is your issue.
More information about the rabbitmq-discuss