[rabbitmq-discuss] RabbitMQ experience
sustrik at imatix.com
Tue Jan 27 07:45:46 GMT 2009
> It is actually surprisingly difficult to determine the *maximum*
> sustainable throughput that way.
That's because there's no such thing as maximum throughput. Frankly,
from methodological point of view the metric is completely broken.
Still, we have to measure it because people expect us to do so.
> My experiments have shown that in a
> test where the feedback loop ensures a constant lag (the number of
> messages "in flight", i.e. the difference between the number of messages
> sent by the producer and received by the consumer), plotting the lag vs
> throughput exhibits some peculiar characteristics:
The problem here is that the throughput has to be measured at a _single_
point of the flow. Trying to measure throughput between point A and
point B gives some strange metric that is half throughput and half
latency. Think of a river. You can measure the flow rate of Danube
(m^3/s) in Vienna, you can measure the flow rate in Budapest, but
there's no such way as "flow rate between Vienna and Budapest".
> - there are local maxima, due to buffering and other effects
The maxima are inherent to the methodology. To compute throughput you
have to do a rolling average over the data. The smaller the window for
rolling average (and more fine grained the view of the data) the more
peaky it is. With window size approaching 1, high peaks approach
infinity while low peaks approach zero.
> - the graph is very sensitive to the setup
Yes. Confirmed. There's very little point in presenting throughput
results to users as any little change in customer's setup can shift the
actual figures by 50% or so.
> - the graph changes over time, due in part to the effects of JITs
> - sampling intervals have to be very long to get anything approaching
> reproducible results
We are using sampling intervals of 10,000,000-100,000,000 messages to
keep the test quick and what we see is that results differ by up to 20%
- even though dedicated non-switched link and real-time OS are used,
whole quad-core CPU is shielded for the test, scheduling is set to FIFO etc.
> And all that happens when the feedback loop has been minimised by
> colocating the producer and consumer in the same O/S process and using
> internal message passing for the feedback. Routing the feedback through
> the broker would make the results even more unpredictable.
Yes, the "broken methodology" problem applies even in a single process.
Actually, passing messages between two threads may prove to be even more
peaky as the timeslices assigned to a thread by OS are non-continuous by
design and thus messages tend to processed in bursts.
> That's why so far the goal of writing a "press a button and get the
> maximum throughput figure" test has eluded us. Coming up with a test
> that delivers results with a +/-20% accuracy isn't too hard. But that is
> far too insensitive for performance regression tests, where we are
> interested in spotting variations of as little as 2%.
On more reason why "maximal throughput" metric is broken: It's often
considered obvious that if maximal throughput is 200,000 msgs/sec, the
system will be able to handle 100,000 msgs/sec. This may not be true.
For example, when ingress rate is 200,000 msgs/sec system may be able to
do some batching that will reduce number of packets on the network. With
100,000 msgs/sec the ingress can be too slow to trigger batching
mechanism and each message will be sent as a separate packet overloading
the network stack. The consequence is that you'll experience no latency
problems at the rate 200,000 msgs/sec, but messages may be delayed at
Given all the methodological problems above and many more I haven't even
mentioned lead us to use a different metric internally. We call it
"message density" (1/lambda for those familiar with Erlang's work). It's
a time interval between two subsequent messages at a single point of the
network. The metric proved to be very stable and the tests are suddenly
reproducible :) Also, performance results make much more sense when
measured using density metric. Check following two graphs:
First of them uses "throughput" metric, second one "density" metric.
Even on first sight it's obvious that the "density" communicates more
stable and predictable info about the behaviour of the system.
Sorry for such a long email, but we've messed with measurement
methodology for at least a year so I have strong opinions on it.
More information about the rabbitmq-discuss