[rabbitmq-discuss] Looking for guidance on R14B04 vs. R13B03 performance

Wed Feb 15 19:18:54 GMT 2012

| So I think this is really the key point. OK, I accept your use case :)

Yeah, I figured I would have to go into some detail to get past the
(usually reasonable) "don't do that" round of responses. :-)

Regardless, thank you much for accepting the use case and plowing forward
with the perf question.

| I assume that this is not how you plan to go into production, you are
just testing how fast you can go if disk speed is not an issue?

Correct. :-)

| The real issue though is going to be latency from client to server. What
does that look like?

I'm still poking at this, trying to get a concrete answer from the maze of
available tools. All the VMs are using VMXNET3 and exhibit amazing perf
between each other using netperf.

HOWEVER... one data point tends to take client/broker latency out of the
question:

If I remove the clustering, so as to just hit a single broker, the rate
goes up to 900 message/sec. Thus, I'm directed back to looking for latency
between the master/slaves.

One other potentially interesting data point: With 2 slaves, the message
rate is only very slightly degraded from 1 slave, i.e. 37/second with 1
slave, and 36/sec with two. I assume the slave synchronization is done in
parallel, so this isn't totally surprising.

Matt

On Wed, Feb 15, 2012 at 9:36 AM, Simon MacMullen <simon at rabbitmq.com> wrote:

> On 15/02/12 17:15, Matt Pietrek wrote:
>
>> Thanks Simon. The 5% figure is useful for me.
>>
>
> Cheers. Bear in mind that's just my guess, and it's also dependent on your
> situation being CPU-bound - which it sounds like you aren't.
>
>  Let me give you a more precise description of what I'm doing to get the
>> 36 message/sec.
>>
>>    * RabbitMQ 2.71 on a 3-node cluster with mirrored queues, durable on
>>      all nodes.
>>    * Client is Python 2.6/Pika 0.9.5.
>>    * Each message publish occurs in a transaction so that we can be
>>
>>      sure it's safely in RabbitMQ.
>>    * All nodes are Ubuntu 10.04 VMs with 4GB RAM and 2 or 4 vCPUs.
>>
>>
>>
>> At the heart of things, we're driving a highly complex state machine
>> that manages thousands of VMs and their associated state. Losing track
>> of any state is prohibitively expensive to clean up manually. As such,
>> all state is modeled in clustered databases and/or persistent messages
>> in the message queue. We have to assume that a given client app instance
>> (our management code) may be ungracefully terminated at any moment, so
>> enough state must be modeled to let a new instance pick up and recover.
>> If our database record indicates that a message has been sent, it better
>> darn well be in the hands of the RabbitMQ broker, and not sitting in
>> some Pika client-side queue.
>>
>> For this reasons, publisher-confirms are not particularly helpful - They
>> assume that the client app will be around to resend the message if the
>> message doesn't get confirmed. Similar story for batching messages. We
>> have to know they've been sent, and
>>
>
> OK.
>
>
>  we can't stall our state machine
>> waiting for enough message to accumulate to publish multiple messages at
>> once.
>>
>
> So I think this is really the key point. OK, I accept your use case :)
>
>
>  My goal in my latest round of experiments is to see what the maximum
>> throughput of a highly available system is in optimal circumstances.
>> We're perfectly willing to spend the money on high end SSDs and
>> networking equipment as necessary.
>>
>> To prototype what this perf level is, I've configured RabbitMQ with the
>> MNESIA directory pointing to a ramdisk (/tmp).
>>
>
> I assume that this is not how you plan to go into production, you are just
> testing how fast you can go if disk speed is not an issue?
>
>
>  I've configured all the
>> VMs with VMXNET3 networking, am with 16K blocks, are seeing bandwidth of
>> 130MB/sec between VMs in the cluster.
>>
>
> The real issue though is going to be latency from client to server. What
> does that look like?
>
>
>  My test app simply writes 6 byte message, one at a time, as quickly as
>> it can. In monitoring the cluster nodes, I'm seeing very low CPU usage,
>> very few writes to the physical disk, and network operation rates of
>> about 700/sec for the master node and 350/sec for the client node.
>>
>> In short, there's a bottleneck somewhere and it's not obvious where.
>> I'll try your suggestion about replacing tx.commit. Any other insight or
>> guidance would of course be very much appreciated. :-)
>>
>
> It sounds like you have already almost eliminated disk writes in practice,
> so my working assumption would be that it's client -> server latency that
> is your issue.
>
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120215/d72b7d74/attachment.htm>