[rabbitmq-discuss] RabbitMQ experience

Tue Jan 27 07:45:46 GMT 2009

Matthias, Gordon,

> It is actually surprisingly difficult to determine the *maximum* 
> sustainable throughput that way.

That's because there's no such thing as maximum throughput. Frankly, 
from methodological point of view the metric is completely broken. 
Still, we have to measure it because people expect us to do so.

> My experiments have shown that in a 
> test where the feedback loop ensures a constant lag (the number of 
> messages "in flight", i.e. the difference between the number of messages 
> sent by the producer and received by the consumer), plotting the lag vs 
> throughput exhibits some peculiar characteristics:

The problem here is that the throughput has to be measured at a _single_ 
point of the flow. Trying to measure throughput between point A and 
point B gives some strange metric that is half throughput and half 
latency. Think of a river. You can measure the flow rate of Danube 
(m^3/s) in Vienna, you can measure the flow rate in Budapest, but 
there's no such way as "flow rate between Vienna and Budapest".

> - there are local maxima, due to buffering and other effects

The maxima are inherent to the methodology. To compute throughput you 
have to do a rolling average over the data. The smaller the window for 
rolling average (and more fine grained the view of the data) the more 
peaky it is. With window size approaching 1, high peaks approach 
infinity while low peaks approach zero.

> - the graph is very sensitive to the setup

Yes. Confirmed. There's very little point in presenting throughput 
results to users as any little change in customer's setup can shift the 
actual figures by 50% or so.

> - the graph changes over time, due in part to the effects of JITs
> 
> - sampling intervals have to be very long to get anything approaching 
> reproducible results

We are using sampling intervals of 10,000,000-100,000,000 messages to 
keep the test quick and what we see is that results differ by up to 20% 
- even though dedicated non-switched link and real-time OS are used, 
whole quad-core CPU is shielded for the test, scheduling is set to FIFO etc.

> And all that happens when the feedback loop has been minimised by 
> colocating the producer and consumer in the same O/S process and using 
> internal message passing for the feedback. Routing the feedback through 
> the broker would make the results even more unpredictable.

Yes, the "broken methodology" problem applies even in a single process. 
Actually, passing messages between two threads may prove to be even more 
peaky as the timeslices assigned to a thread by OS are non-continuous by 
design and thus messages tend to processed in bursts.

> That's why so far the goal of writing a "press a button and get the 
> maximum throughput figure" test has eluded us. Coming up with a test 
> that delivers results with a +/-20% accuracy isn't too hard. But that is 
> far too insensitive for performance regression tests, where we are 
> interested in spotting variations of as little as 2%.

On more reason why "maximal throughput" metric is broken: It's often 
considered obvious that if maximal throughput is 200,000 msgs/sec, the 
system will be able to handle 100,000 msgs/sec. This may not be true. 
For example, when ingress rate is 200,000 msgs/sec system may be able to 
do some batching that will reduce number of packets on the network. With 
100,000 msgs/sec the ingress can be too slow to trigger batching 
mechanism and each message will be sent as a separate packet overloading 
the network stack. The consequence is that you'll experience no latency 
problems at the rate 200,000 msgs/sec, but messages may be delayed at 
100,000 msgs/sec.

Given all the methodological problems above and many more I haven't even 
mentioned lead us to use a different metric internally. We call it 
"message density" (1/lambda for those familiar with Erlang's work). It's 
a time interval between two subsequent messages at a single point of the 
network. The metric proved to be very stable and the tests are suddenly 
reproducible :) Also, performance results make much more sense when 
measured using density metric. Check following two graphs:

http://www.zeromq.org/results:0mq-tests-v03#toc4

First of them uses "throughput" metric, second one "density" metric. 
Even on first sight it's obvious that the "density" communicates more 
stable and predictable info about the behaviour of the system.

Sorry for such a long email, but we've messed with measurement 
methodology for at least a year so I have strong opinions on it.

Martin