[rabbitmq-discuss] STOMP performance problems at > 1200 messages/s
Al Tobey
al at ooyala.com
Tue Nov 8 00:18:19 GMT 2011
I mentioned some performance issues I experienced over the weekend with
STOMP on Twitter and @monadic asked me to send an email, so here it is.
For a recent internal hackathon project, I hooked some of our logs to
RabbitMQ so a team of engineers could process events in real time. Having
done this in the past, I wrote a simple perl program to tail the logfile in
question and put each message on the queue (e.g. /queue/logs). The
messages are about 1100 bytes on average, including the usual apache style
log stuff and a serialized object, mime-encoded. I was capturing logs from
8 machines that are load balanced, so the rate of messages was pretty even.
Everything seemed to be working fine when I set it up. Messages were going
in and coming out just fine at 5000-6000 messages/s (both sides stomp).
Once the engineer I was helping started running his stuff against RMQ, we
started to notice that there was anywhere from 30-600 seconds of lag
between the logs and his client (a dead-simple Ruby app using net-stomp).
To make things extra fun, our traffic rose and the message rate rose with
up to ~14km/s. We instrumented my tailing code and his ruby code and
couldn't find any issue with either. I started watching the RabbitMQ
instance, which is running in EC2 on a m2.xlarge (in retrospect, not the
best instance choice). I could see both vcpus were pretty busy hovering at
6-10% idle with 12-20% system. I pulled up top with thread view enabled and
could see two threads were pegging the CPU. I assumed, but did not verify,
that these were running the STOMP code. At the same time all of this was
happening, we were watching message rates in the admin webui. When I
compared numbers in the UI to what my producers were sending, there was a
large mismatch that correlated with the delay we were seeing at the
consumers. The memory usage of erlang on the RMQ host was growing well
past the 40% mark, so we bumped that to 75%, which simply allowed more time
before it blew up.
My suspicion is that the STOMP plugin is getting backed up, based on these
observations:
* memory usage regularly maxed out under load (4,000-14,000
messages/second)
* the AMQP queue stats did not match what the producers sent
* the amount of memory consumed was way out of order from what the AMQP
queue depths were (usually close to 0!)
* we were definitely consuming fast enough with 3-8 consumer processes on
dedicated machines (4x m2.xlarge)
* these machines/processes were showing no stress
* after shutting down producers, it appeared as if they were producing
for up to 10 minutes after shutdown
While under the gun, we tried a few quick & dirty hacks:
* dropped every other log line in the producer to cut msg/s in half
* slowed performance decay but did not fix anything
* restarted the producer regularly to cycle connections
* made things worse - we could observe many draining producer channels
in the admin UI that hung around for more than 10 minutes
* after a while we bounced rabbitmq so we could move on
* thrashing seemed to make the existing producer channels drain even
slower
* start more consumers - no change
* shut down producers
* only when I took it down to 1 producer did memory usage stop
climbing, so ~1200 messages/s is the observed limit
Things I'd try if this project was still running, but have not:
* upgrade to Erlang R14
* switch to AMQP producers
* more/faster CPU instances
In any case, the experiment driving all of this is concluded for now. I can
still fire up the producers and dummy consumers for quick tests but don't
have a lot of time to dedicate to debugging this. For what it's worth, the
hackathon project was super cool and successful; I just had to babysit the
queue and fire up producers just before the demo started so delay would be
acceptable ;)
-Al
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20111107/5ac3a90f/attachment.htm>
More information about the rabbitmq-discuss
mailing list