| So I think this is really the key point. OK, I accept your use case :)<br><br>Yeah, I figured I would have to go into some detail to get past the (usually reasonable) &quot;don&#39;t do that&quot; round of responses. :-)<br>

<br>Regardless, thank you much for accepting the use case and plowing forward with the perf question.<br><br><br>| I assume that this is not how you plan to go into production, you are 

just testing how fast you can go if disk speed is not an issue?<br><br>Correct. :-)<br><br><br>| The real issue though is going to be latency from client to server. What does that look like?<br><br>I&#39;m still poking at this, trying to get a concrete answer from the maze of available tools. All the VMs are using VMXNET3 and exhibit amazing perf between each other using netperf.<br>

<br>HOWEVER... one data point tends to take client/broker latency out of the question:<br><br>If I remove the clustering, so as to just hit a single broker, the rate goes up to 900 message/sec. Thus, I&#39;m directed back to looking for latency between the master/slaves.<br>

<br>One other potentially interesting data point: With 2 slaves, the message rate is only very slightly degraded from 1 slave, i.e. 37/second with 1 slave, and 36/sec with two. I assume the slave synchronization is done in parallel, so this isn&#39;t totally surprising.<br>

<br>Matt<br><br><br><div class="gmail_quote">On Wed, Feb 15, 2012 at 9:36 AM, Simon MacMullen <span dir="ltr">&lt;<a href="mailto:simon@rabbitmq.com">simon@rabbitmq.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On 15/02/12 17:15, Matt Pietrek wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Thanks Simon. The 5% figure is useful for me.<br>

</blockquote>

<br></div>

Cheers. Bear in mind that&#39;s just my guess, and it&#39;s also dependent on your situation being CPU-bound - which it sounds like you aren&#39;t.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

Let me give you a more precise description of what I&#39;m doing to get the<br>

36 message/sec.<br>

<br></div>

    * RabbitMQ 2.71 on a 3-node cluster with mirrored queues, durable on<br>

      all nodes.<br>

    * Client is Python 2.6/Pika 0.9.5.<br>

    * Each message publish occurs in a transaction so that we can be<div class="im"><br>

      sure it&#39;s safely in RabbitMQ.<br></div>

    * All nodes are Ubuntu 10.04 VMs with 4GB RAM and 2 or 4 vCPUs.<div class="im"><br>

<br>

<br>

At the heart of things, we&#39;re driving a highly complex state machine<br>

that manages thousands of VMs and their associated state. Losing track<br>

of any state is prohibitively expensive to clean up manually. As such,<br>

all state is modeled in clustered databases and/or persistent messages<br>

in the message queue. We have to assume that a given client app instance<br>

(our management code) may be ungracefully terminated at any moment, so<br>

enough state must be modeled to let a new instance pick up and recover.<br>

If our database record indicates that a message has been sent, it better<br>

darn well be in the hands of the RabbitMQ broker, and not sitting in<br>

some Pika client-side queue.<br>

<br>

For this reasons, publisher-confirms are not particularly helpful - They<br>

assume that the client app will be around to resend the message if the<br>

message doesn&#39;t get confirmed. Similar story for batching messages. We<br>

have to know they&#39;ve been sent, and<br>

</div></blockquote>

<br>

OK.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

we can&#39;t stall our state machine<br>

waiting for enough message to accumulate to publish multiple messages at<br>

once.<br>

</blockquote>

<br></div>

So I think this is really the key point. OK, I accept your use case :)<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My goal in my latest round of experiments is to see what the maximum<br>

throughput of a highly available system is in optimal circumstances.<br>

We&#39;re perfectly willing to spend the money on high end SSDs and<br>

networking equipment as necessary.<br>

<br>

To prototype what this perf level is, I&#39;ve configured RabbitMQ with the<br>

MNESIA directory pointing to a ramdisk (/tmp).<br>

</blockquote>

<br></div>

I assume that this is not how you plan to go into production, you are just testing how fast you can go if disk speed is not an issue?<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I&#39;ve configured all the<br>

VMs with VMXNET3 networking, am with 16K blocks, are seeing bandwidth of<br>

130MB/sec between VMs in the cluster.<br>

</blockquote>

<br></div>

The real issue though is going to be latency from client to server. What does that look like?<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My test app simply writes 6 byte message, one at a time, as quickly as<br>

it can. In monitoring the cluster nodes, I&#39;m seeing very low CPU usage,<br>

very few writes to the physical disk, and network operation rates of<br>

about 700/sec for the master node and 350/sec for the client node.<br>

<br>

In short, there&#39;s a bottleneck somewhere and it&#39;s not obvious where.<br>

I&#39;ll try your suggestion about replacing tx.commit. Any other insight or<br>

guidance would of course be very much appreciated. :-)<br>

</blockquote>

<br></div>

It sounds like you have already almost eliminated disk writes in practice, so my working assumption would be that it&#39;s client -&gt; server latency that is your issue.<div class="HOEnZb"><div class="h5"><br>

<br>

Cheers, Simon<br>

<br>

-- <br>

Simon MacMullen<br>

RabbitMQ, VMware<br>

</div></div></blockquote></div><br>