[rabbitmq-discuss] 2.7.1 mirrored queues leak a lot of memory on slave nodes
Max Kalika
max.kalika at gmail.com
Fri Feb 24 15:19:18 GMT 2012
This turned out to be worse than we first realized. When two of the
servers exhibited high memory usage, clients became blocked and
stopped receiving data. I was able to unwedge the running process
with some erlang surgery. Obviously, this was a high-risk operation
on a running production system, but it seemed to have (mostly)
recovered and allowed data to flow again. I say mostly because
there's a caveat which I'll describe later.
My procedure to fix this is as follows:
1) attach an erlang remote shell to the running process:
erl -sname rabbitdbg@$(hostname -s) -setcookie $(sudo cat /var/lib/
rabbitmq/.erlang.cookie) -remsh rabbit@$(hostname -s) -hidden
2) list processes and look for anything consuming an unusually large
amount of memory pages (second column from the right):
i().
3) There was one process with over 2 million pages. It was constantly
running lists:zipwith/3. Info on it showed lots and lots of messages
in the mailbox that just wasn't decreasing.
process_info(list_to_pid("<0.XXXXX.XXX>")).
4) With fingers and toes crossed, I forced this process to die. As
soon as this was executed, memory usage dropped and data started
flowing again:
exit(list_to_pid("<0.XXXXX.XXX>"), kill).
I mentioned earlier that the system is *mostly* recovered. The
remaining problem is disk utilization. I suspect that since our
messages are marked durable, disk cleanup didn't occur. I'm not sure
how to sync this up with runtime reality without restarting.
On Feb 22, 12:12 pm, Reverend Chip <rev.c... at gmail.com> wrote:
> I have a four-node 2.7.1 cluster. I just started experimenting with
> mirrored queues. One queue is mirrored across nodes 1&2, a second queue
> is mirrored across nodes 3&4. I've been feeding a lot of large messages
> through, using delivery-mode 2. Most of the messages I've purged, since
> the reader process can't keep up yet.
>
> Here's the problem: Memory usage. Nodes 1 & 3, presumably the master
> nodes for the queues, have maintained a normal memory profile, 3-6GB.
> But nodes 2 & 4, the presumable slaves, have had their memory grow to
> 58GB each. Worse, when I purged and then even deleted the queues, the
> memory usage did not go down. It seems I may have to reboot these nodes
> to get the memory back, and obviously I can't use mirrored queues if
> they're going to make my nodes do this, which is disappointing. I do
> have a workaround involving alternate exchanges, but the workaround can
> leave data stranded if a node is lost forever.
>
> Is there any other info I can provide to help diagnose and/or fix this?
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
More information about the rabbitmq-discuss
mailing list