[rabbitmq-discuss] 2.7.1 mirrored queues leak a lot of memory on slave nodes

Max Kalika max.kalika at gmail.com
Fri Feb 24 15:19:18 GMT 2012


This turned out to be worse than we first realized.  When two of the
servers exhibited high memory usage, clients became blocked and
stopped receiving data.  I was able to unwedge the running process
with some erlang surgery.  Obviously, this was a high-risk operation
on a running production system, but it seemed to have (mostly)
recovered and allowed data to flow again.  I say mostly because
there's a caveat which I'll describe later.

My procedure to fix this is as follows:
1) attach an erlang remote shell to the running process:

  erl -sname rabbitdbg@$(hostname -s) -setcookie $(sudo cat /var/lib/
rabbitmq/.erlang.cookie) -remsh rabbit@$(hostname -s) -hidden

2) list processes and look for anything consuming an unusually large
amount of memory pages (second column from the right):

  i().

3) There was one process with over 2 million pages.  It was constantly
running lists:zipwith/3.  Info on it showed lots and lots of messages
in the mailbox that just wasn't decreasing.

  process_info(list_to_pid("<0.XXXXX.XXX>")).

4) With fingers and toes crossed, I forced this process to die.  As
soon as this was executed, memory usage dropped and data started
flowing again:

  exit(list_to_pid("<0.XXXXX.XXX>"), kill).


I mentioned earlier that the system is *mostly* recovered.  The
remaining problem is disk utilization.  I suspect that since our
messages are marked durable, disk cleanup didn't occur.  I'm not sure
how to sync this up with runtime reality without restarting.

On Feb 22, 12:12 pm, Reverend Chip <rev.c... at gmail.com> wrote:
> I have a four-node 2.7.1 cluster.  I just started experimenting with
> mirrored queues.  One queue is mirrored across nodes 1&2, a second queue
> is mirrored across nodes 3&4.  I've been feeding a lot of large messages
> through, using delivery-mode 2.  Most of the messages I've purged, since
> the reader process can't keep up yet.
>
> Here's the problem: Memory usage.  Nodes 1 & 3, presumably the master
> nodes for the queues, have maintained a normal memory profile, 3-6GB.
> But nodes 2 & 4, the presumable slaves, have had their memory grow to
> 58GB each.  Worse, when I purged and then even deleted the queues, the
> memory usage did not go down.  It seems I may have to reboot these nodes
> to get the memory back, and obviously I can't use mirrored queues if
> they're going to make my nodes do this, which is disappointing.  I do
> have a workaround involving alternate exchanges, but the workaround can
> leave data stranded if a node is lost forever.
>
> Is there any other info I can provide to help diagnose and/or fix this?
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


More information about the rabbitmq-discuss mailing list