[rabbitmq-discuss] RabbitMQ crashes hard when it runs out of memory

Matthew Sackman matthew at lshift.net
Fri Nov 6 10:06:20 GMT 2009


Hi Stephen,

Thanks for the patch, and for digging around enough to come up with a
solution.

On Thu, Nov 05, 2009 at 01:57:31PM -0800, Stephen Day wrote:
> Indeed, this is a bit heinous, but it gets the job done. Unfortunately, I
> don't have the appropriate bug id so I didn't create an hg branch for you to
> pull from.

That's fine. I have to say that it's unlikely this patch will make it
through - the memory management code has gone through a lot of change
recently as we're getting a much better handle on resource management.
Whilst you've obviously been working from the head of our default branch
(many thanks!), there are a couple of issues with garbage collecting
every process like that, for example, it's possible that garbage
collecting vast numbers of processes will take longer than the
memory_check_interval, making messages queue up for the memory manager
process. This would become a problem if the garbage collection is unable
to reclaim any memory at all - eg millions of queues, all of which are
empty.

> As far as overall system effects go, I haven't noticed any (aside from the
> lack of crashes). We have been running this in production for a bit and
> haven't seen any large problems, although the application is low throughput.
> Are there any performance unit tests that I can run to check this?

Yeah, when you garbage collect a process it stops the process. Also, I
*believe* that Erlang uses a generational garbage collector. Normally,
it'll most likely only sweep the young generation, which should be
quick, but I suspect that manually calling garbage_collect will do a
full sweep of all generations, thus potentially taking longer. You may
find that this causes performance to dip.

We tend to measure using the java client. If you get that, and then ant
dist, and then cd build/dist, then start up rabbit and try:

sh runjava.sh com.rabbitmq.examples.MulticastMain -r 20000 -s 0 -a

On my machine, I can bump that 20000 to about 25000 and the sending
rates and receiving rates are about equal (i.e. the queue length doesn't
grow too much). Obviously your hardware may be different, but I suspect
that garbage collection may have a performance impact, obviously
depending on how often it's done. With the default memory_check_interval
of 1 sec, my guess is that it'd be noticeable.

Much better resource management is on its way. However, if your patch
works for you then obviously, please use it.

Matthew




More information about the rabbitmq-discuss mailing list