[rabbitmq-discuss] performance

Mon Sep 1 13:01:57 BST 2008

Edwin,

Edwin Fine wrote:
> 
>     Some testing we did in the past indicates that generally a clustered
>     broker - with one node per core and smp disabled for the Erlang VM -
>     performs significantly better than a single smp-enabled node.
> 
> 
> Now /that/ is very interesting. I have seen the same kind of thing in 
> some ad-hoc (non-RabbitMQ related) experiments I did some time back 
> (better performance from multiple VMs, each in single-cpu configuration, 
> on a multi-core system) and thought it was my imagination because SMP 
> was supposed to be the way to go. I haven't seen much discussion on this 
> on the erlang-questions mailing lists, and quite frankly, I'm not going 
> to start one without some solid, repeatable evidence. If you have seen 
> this behavior, have you brought it up with the Erlang gurus, and if so, 
> have they said anything enlightening about it?

I have mentioned our observations to a few folks, but, as you say, there 
is no point in pursuing this further until we have solid, repeatable 
evidence. Now, our results *are* repeatable, but they are all in the 
context of RabbitMQ. To start a fruitful discussion on the Erlang list / 
with the Erlang gurus we'd need to construct a simpler, standalone test 
exhibiting the same behaviour. In the process we may well discover the 
root cause of the problem ourselves.

Btw, one issue with performance testing of RabbitMQ is that it is really 
difficult to measure the maximum throughput. RabbitMQ is a message 
*queuing* system, and any test setup will have several message buffers 
at various levels - the OS's network stack at the test client and 
RabbitMQ server, various process message queues at the server and 
buffers in the test client, and the queue processes at the server. 
Optimum throughput is achieved when all these buffers contain just the 
right amount of data so that the processing hanging off them never has 
to wait for data and yet no data is buffered unnecessarily. There are 
lots of tweakable parameters that affect buffering in the OS, the 
Erlang/Java VM, and the client/server apps. Furthermore, due to jit-ing 
and variations in scheduling decisions (by the VMs and the OS) the 
optimal settings shift over time.

As others have discovered, if a test just blasts messages at RabbitMQ, 
the broker will likely start queuing up most of them, consume increasing 
amounts of memory, and eventually grind to a halt. To get a sensible max 
throughput measurement a more sophisticated approach is required that 
controls and adapts the sending rate to the prevailing conditions.

We'd love to get the help of the community to put together a really 
simple "run this and it will report the maximum throughput" test 
program. Initially this can be for just the simplest (and fastest) 
routing scenario - single producer, single consumer (running in the same 
OS process if that is convenient), single queue, direct exchange, 
auto-ack basic.consume.

Note that this test app would work against all AMQP brokers, not just 
RabbitMQ, so could be used for performance comparison.

Any takers?

Matthias.