[rabbitmq-discuss] rabbit disk_mode branch eating up all RAM, including swap, dying

Sun Oct 4 14:03:28 BST 2009

Hi, we're using 184cb96f7846+ (bug20980) and our host alerted us that rabbit
was eating up all available swap on a 16GB real + 8GB swap machine.

"""
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP COMMAND
18445 rabbitmq  18   0 24.7g  14g 1696 S 1087.1 91.7   2268:18  10g beam.smp

In an effort to prevent kernel panic, we restarted the rabbitmq service,
freeing up a considerable amount of swap:

However, the rabbitmq server is not starting again as expected, due to the
following exception:

2009-10-04 06:26:29.797201500 {"init terminating in
do_boot",{{nocatch,{error,{cannot_start_application,rabbit,{{timeout_waiting_for_tables,[rabbit_disk_queue]},{rabbit,start,[normal,[]]}}}}},[{init,start_it,1},{init,start_em,1}]}}
"""

They had to delete the mnesia folder (losing all our disk-backed queues) and
restart\, now it's fine. I would guess that this breakage coincided with us
storing quite a large number of unacked messages in the queues (job
instructions for a very large batch)

a) Would upgrading this branch fix this? We were avoiding doing so because
things were relatively stable.

b) is there anything else I can look at to debug? The logs don't have
anything of importance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20091004/a1ddc7e7/attachment.htm