[rabbitmq-discuss] RabbitMQ perfomance testing & troubles

Wed Apr 9 22:50:02 BST 2008

Alexey,

Alexey Slynko wrote:
> we have done some stress tests of rabbitmq. Our main goal was the ~10K
> messages per second and maximum cpu & ram usage. Two servers with equal
> configuration listed below were used for this task.

> Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz CPU1
> Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz CPU2

You shouldn't have any trouble hitting ~10k mps on these boxes. We have 
achieved that on machines with much lower specs.

> These servers were connected with network link Ethernet 100Mb.

That's rather slow, though you probably won't be hitting the limits in 
your tests. Check the stats on the network interface to get an idea how 
much headroom you have.

> Erlang version (stable debian distribution) :
> 
> # dpkg -l erlang\* | grep ii
> ii  erlang-base      11.b.2-4       Concurrent, real-time, distributed
> functiona
> ii  erlang-nox       11.b.2-4       Concurrent, real-time, distributed
> functiona

Any chance you could try the tests with R11B-5? Most of our testing has 
been done on that. I doubt it will make much difference but at the very 
least it will make it easier for us to compare results.

> Another midfications were done for normal mnesia work (errors listed below)
> 
> ERROR REPORT==== 4-Apr-2008::18:29:14 ===
> Error in process <0.20713.0> on node 'rabbit at altair' with exit value:
> {{case_clause,{error,{system_limit,"Cannot create an ets table for
> the local transaction
> store",{system_limit,[{ets,new,[mnesia_trans_store,[bag,public]]},{mnesia_tm,doit_loop,1},{mnesia_sp,init_proc,4},{ 
> 
> proc_lib...

We have never seen this before. At what point during your tests did you 
get this error?

> and for increasing common throughput
> 
> # diff -u rabbitmq-server.orig /usr/sbin/rabbitmq-server
> --- rabbitmq-server.orig        2008-04-07 22:27:26.000000000 +0400
> +++ /usr/sbin/rabbitmq-server   2008-04-07 22:28:40.000000000 +0400
> @@ -28,7 +28,7 @@
>   [ "x" = "x$NODE_IP_ADDRESS" ] && NODE_IP_ADDRESS=0.0.0.0
>   [ "x" = "x$NODE_PORT" ] && NODE_PORT=5672
> 
> -ERL_ARGS="+K true +A30 -kernel inet_default_listen_options
> [{sndbuf,16384},{recbuf,4096}]"
> +ERL_ARGS="-env ERL_MAX_ETS_TABLES 10240 -env ERL_MAX_PORTS 10240 +K
> true +A300 +P512000 -kernel inet_default_listen_options
> [{sndbuf,65535},{recbuf,65535},{reuseaddr,true}] -smp auto"
>   CLUSTER_CONFIG_FILE=/etc/default/rabbitmq_cluster.config
>   CONFIG_FILE=/etc/default/rabbitmq

These should all be fine except I have some concerns about the "-smp 
auto", given how old a version of Erlang/OTP are running. Did you test 
that adding the flag indeed improves performance?

Note that RabbitMQ when running with Erlang/OTP R11B (rather than the 
more recent R12) will not benefit much (if at all) from the multiple 
cores when running with -smp. In our experiments we have been getting 
much better results when configuring a local cluster of one non-smp node 
per code. So I'd recommend you do that.

> Client source code attached to this message.

The following will all impact performance:

- The queues are not auto-deleting and aren't being deleted explicitly, 
and your queue/exchange naming scheme is deterministic. Unless you 
restart the server between tests or subsequent tests will be affected by 
the queues from earlier tests. I'd also recommend clearing the mnesia 
dir prior to starting rabbit, just to make sure that there aren't any 
stray exchanges, queues or persistent messages.

- The consumer explicitly acknowledges each message. Would 
bulk-acknowledgment or auto-acknowledgment be an option?

- The tests use a single topic exchange, with a distinct routing/binding 
key for each producer/consumer pair. That is a rather peculiar set up 
and may well run into performance problems for high numbers of queues. 
The reason is that currently topic exchange routing simply pattern 
matches the routing key against each binding key in term, i.e. matching 
time is linear in the number of queues. That's on our todo list to 
change - we used to have some caching logic in pre-1.3.0 versions but 
removed that because it exhibited unbounded memory growth when people 
used UIDs for routing keys. Anyway, I'd suggest you use a direct 
exchange. All you should have to do is change the type in the exchange 
declaration; the rest of your code ought to work unchanged.

- You start lots of producers and consumer and each establishes a 
separate connection. Dmitriy already pointed out several reasons why 
that may be affecting performance. Another reason is that connection 
establishment and teardown is quite expensive in RabbitMQ. Queue 
creation, binding and deletion are quite expensive too. I notice you 
have some "sleep" statements in your code, presumably to prevent the set 
up tasks from interfering with the measurements. However, given the high 
number of producers and consumers you are creating I suspect you will 
still get interference. The mnesia errors/warnings you are seeing are an 
indication of that because the sending/receiving of messages does not 
involve any mnesia writes but connection, queue, exchange and binding 
creation all do. If you really want to measure throughput for that many 
connections and queues then I'd suggest you make sure they are all 
created prior to any messages being sent.

You may also want to run one of the test programs supplied with the 
RabbitMQ Java client packages, such as MulticastMain, e.g.
   sh ./runjava.sh com.rabbitmq.examples.MulticastMain -h <host> -a
which reports on throughput and latency.

I hope the above helps. Please do let us know what results you are getting.

Regards,

Matthias.