[rabbitmq-discuss] STOMP performance problems at > 1200 messages/s

Al Tobey al at ooyala.com
Tue Nov 8 00:18:19 GMT 2011


I mentioned some performance issues I experienced over the weekend with
STOMP on Twitter and @monadic asked me to send an email, so here it is.

For a recent internal hackathon project, I hooked some of our logs to
RabbitMQ so a team of engineers could process events in real time. Having
done this in the past, I wrote a simple perl program to tail the logfile in
question and put each message on the queue (e.g. /queue/logs).  The
messages are about 1100 bytes on average, including the usual apache style
log stuff and a serialized object, mime-encoded.  I was capturing logs from
8 machines that are load balanced, so the rate of messages was pretty even.

Everything seemed to be working fine when I set it up. Messages were going
in and coming out just fine at 5000-6000 messages/s (both sides stomp).

Once the engineer I was helping started running his stuff against RMQ, we
started to notice that there was anywhere from 30-600 seconds of lag
between the logs and his client (a dead-simple Ruby app using net-stomp).
To make things extra fun, our traffic rose and the message rate rose with
up to ~14km/s.  We instrumented my tailing code and his ruby code and
couldn't find any issue with either. I started watching the RabbitMQ
instance, which is running in EC2 on a m2.xlarge (in retrospect, not the
best instance choice). I could see both vcpus were pretty busy hovering at
6-10% idle with 12-20% system. I pulled up top with thread view enabled and
could see two threads were pegging the CPU.  I assumed, but did not verify,
that these were running the STOMP code. At the same time all of this was
happening, we were watching message rates in the admin webui. When I
compared numbers in the UI to what my producers were sending, there was a
large mismatch that correlated with the delay we were seeing at the
consumers.  The memory usage of erlang on the RMQ host was growing well
past the 40% mark, so we bumped that to 75%, which simply allowed more time
before it blew up.

My suspicion is that the STOMP plugin is getting backed up, based on these
observations:
  * memory usage regularly maxed out under load (4,000-14,000
messages/second)
  * the AMQP queue stats did not match what the producers sent
  * the amount of memory consumed was way out of order from what the AMQP
queue depths were (usually close to 0!)
  * we were definitely consuming fast enough with 3-8 consumer processes on
dedicated machines (4x m2.xlarge)
     * these machines/processes were showing no stress
  * after shutting down producers, it appeared as if they were producing
for up to 10 minutes after shutdown

While under the gun, we tried a few quick & dirty hacks:

  * dropped every other log line in the producer to cut msg/s in half
    * slowed performance decay but did not fix anything
  * restarted the producer regularly to cycle connections
    * made things worse - we could observe many draining producer channels
in the admin UI that hung around for more than 10 minutes
       * after a while we bounced rabbitmq so we could move on
    * thrashing seemed to make the existing producer channels drain even
slower
  * start more consumers - no change
  * shut down producers
    * only when I took it down to 1 producer did memory usage stop
climbing, so ~1200 messages/s is the observed limit

Things I'd try if this project was still running, but have not:

  * upgrade to Erlang R14
  * switch to AMQP producers
  * more/faster CPU instances

In any case, the experiment driving all of this is concluded for now. I can
still fire up the producers and dummy consumers for quick tests but don't
have a lot of time to dedicate to debugging this.  For what it's worth, the
hackathon project was super cool and successful; I just had to babysit the
queue and fire up producers just before the demo started so delay would be
acceptable ;)

-Al
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20111107/5ac3a90f/attachment.htm>


More information about the rabbitmq-discuss mailing list