[rabbitmq-discuss] performance

Edwin Fine rabbitmq-discuss_efine at usa.net
Mon Sep 1 23:23:07 BST 2008

If you run with such short messages, what you are also indirectly measuring
is the maximum efficiency of the AMQP protocol because almost all of the
data will be AMQP PDU data.

Another consideration is that when you run using basic.consume, RabbitMQ
will push messages to the consumer as fast as possible. If the consumer is
an Erlang process, all that achieves is to fill up the Erlang  consumer's
process message queue as fast as Erlang can take it (which is pretty fast).
This only reflects the speed at which RabbitMQ can publish messages, not the
end-to-end throughput of the system as such. However, I can think of a
possible way around this (described later).

I think we need a number of different approaches taken together.

The first approach, which we could call the "pedal-to-the-metal" approach,
would be to measure the theoretical maximum performance envelope for a
specific hardware/OS platform, by running Erlang-only clients in-process
with the Rabbit server (i.e. in the same VM). This would eliminate all
networking because data would be passed between RabbitMQ and clients
internally via Erlang. This probably represents the fastest possible model.
At the same time, we would use the 1-byte data suggested by Martin, because
that represents the smallest possible packets. The result of this benchmark
would be a messages/second number that would represent the theoretical peak
one could approach (for one producer and consumer).

The above very synthetic benchmark could be used to probe system (i.e.
hardware + OS + Erlang) specific parameters to push performance. Nobody
would consider it to be realistic, but it's still useful. For example, using
+S 1 with multiple RMQ + client nodes; showing the effects of using kernel
poll (should be nil because there are no sockets being used); pre-allocated
Erlang process heaps; SMP Erlang vs single-CPU Erlang with and without
processor affinity; and so on.

Of course, what if you have multiple producers and consumers? How many
channels do you use? How do you configure the exchange(s)? It rapidly
becomes very complicated.

The next approach could be to move the Erlang clients to a TCP/IP networking
model (different VM, but on the same physical host, so hopefully it's only
the TCP/IP stack at play and no Ethernet hardware) and see the difference.
Of course, this is tricky because the clients and servers are both on the
same machine and may contend for resources.

Then we could move the clients to a different host altogether and hope the
Ethernet speed is not a significant limitation. The resource bottleneck
probably becomes the LAN hardware here.

Then we could start trying clustered RabbitMQ servers on multiple hosts, and
this is where it stops for most people (including me) because they don't
have the resources.

For each test, we could vary the message sizes and see the effect on message
rate and throughput. We could also try persistent vs. non-persistent,
basic.get vs basic.consume, and so on. There are so many possible variations
that this could become a full-time job!

As Martin points out, measuring throughput can be tricky. I suggest some
sort of feedback mechanism to adjust the speed at which messages are sent.
For example, the Erlang client could write its message consumption rate each
second into an ets table, which the producer could read. As long as the rate
goes up, the producer keeps pumping more through. As soon as the producer
sees the rate plateau or even drop, averaged over a few measurements to
prevent hysteresis, it cuts back. When the Erlang client is in a different
VM, this is more complicated, but you get the idea. Also, a problem is that
the feedback mechanism itself has CPU and networking overhead which will
have to be taken into account.

If the benchmarks are run for long enough (hours or even days), I think we
might overcome some of the statistical issues that Martin mentions.

My 2c worth.

On Mon, Sep 1, 2008 at 4:09 PM, Martin Sustrik <sustrik at imatix.com> wrote:

> Just few hints:
> 1. Measuring maximal throughput is tricky for various reasons, queueing
> being only one of them. There are statistical and methodological
> problems like hidden average, non-stability of the metric etc. However,
> simple way to get *some* results is to run the tests with publisher
> publishing messages at various rates. Maximal rate where latency doesn't
> increase ad infinitum can be considered "maximal throughput".
> 2. 380,000 256-byte messages a second is a nice result, however, it
> doesn't tell much about the messaging system. It simply means that the
> system is able to exhaust 1GbE with messages 256 bytes long. To get more
> interesting results, test should be run with smaller messages
> (preferably 0 or 1 bytes long) where processing power will be the
> bottleneck rather than the networking infrastructure.
> HTH.
> Martin
> Matthias Radestock wrote:
> > Michael,
> >
> > Mayne, Michael wrote:
> >
> >> Red has produced a paper recently (June 2008) explaining its performance
> >> testing lab that it did recently to show how optimised Red Hat on Intel
> >> Xeon hardware can process very high message rates:
> >>
> http://www.redhat.com/f/pdf/mrg/Reference_Architecture_MRG_Messaging_Thr
> >> oughput.pdf
> >>
> >> This paper was presented at an Intel FasterCITY - fasterMESSAGING event
> >> in London on 23 June.
> >> http://www.intelfasterfs.com/fastermessaging/
> >> It contains a description of the test bench it used to generate its
> >> figures - which were a repeatable ingress rate of 380,000 (256 byte)
> >> messages per second. There is obviously a lot more to it - see the paper
> >> for details.
> >>
> >> That could be a starter for ten.
> >
> > I am familiar with that report. The tests conducted are pretty similar
> > to the ones we did with Intel last year and earlier this year (see
> > http://www.rabbitmq.com/resources/AMQP_Solution_Brief_final.pdf).
> >
> > Unfortunately the test setup - "producers reliably en-queues messages
> > onto the broker as fast as they can, consumers reliably de-queue
> > messages from the broker as fast as they can" - isn't addressing the key
> > problem I described, namely controlling and adapting the ingress rate in
> > order to maximise throughput.
> >
> >
> > Matthias.
> >
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

For every expert there is an equal and opposite expert - Arthur C. Clarke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20080901/a6153f1d/attachment.htm 

More information about the rabbitmq-discuss mailing list