[rabbitmq-discuss] Performance on ec2

Mon Jan 23 17:45:46 GMT 2012

Hi Srdan,

In short, "yes, that's about it", especially on EC2. It would be worth
checking your results with the hipe_compile option turned on with the
latest Rabbit (2.7.1) and the latest Erlang (R15B). Search for
hipe_compile on http://www.rabbitmq.com/configure.html

Over loopback, with very very small message payloads, with hipe turned
on, Rabbit can read off a single socket and throw the data away (i.e.
no bindings from an exchange) at somewhere around 100kHz. Obviously
depending on hardware. Also that figure might be a little old these
days. As payload increases, and non-loopback, and non-hipe, and EC2
issues, plus actually doing something with the messages such sending
them first into queues, buffering them, and then sending them out to
consumers, an overall throughput of around 20kHz is not unexpected.

When we do a simple one-publisher, one-queue, one-consumer benchmark,
until recently, the figures were something like 16kHz with 1-ack per
message (no qos) and 25kHz with noAck turned on (autoack). Those figures
are both in and out - i.e. 25kHz msgs in, and 25kHz msgs out. It's only
due to a bunch of quite recent optimisations and getting hipe to work
for us that those figures have improved a bit.

If increasing publishing rate decreases consuming rate then you've hit a
bottleneck somewhere. Either your CPUs are maxed out (or, if not then
there must somewhere be some other reason why Rabbit can't utilise more
CPU) or you've hit a network bottleneck or a disk bottleneck or
something else.

On a single machine, I'd be surprised if you can get close to 100kHz.
Across a cluster it should be possible provided the load is spread
across all the machines - don't forget a queue is a single process on a
single node within the cluster so the maximum speed of an individual
queue is determined by the speed of a single core on one machine. Thus
if you need to go faster than a single core, then you have to spread the
load across multiple queues. Once you max out the cores on a single
machine, you'll need to go to multiple nodes.

Matthew