[rabbitmq-discuss] RabbitMQ Queues memory leak?

Fri Apr 19 17:39:09 BST 2013

Hi,

On 19/04/13 10:23, Dmitry Saprykin wrote:
>     The difference between 1Mb and 180Mb is relatively large, even after
>     taking expected differences due to garbage collection into account. We
>     can't rule out a memory leak, but need some assistance from you to
>     confirm.
> 
>     Do you see the same asymmetry if the master node for the queues switch
>     from one node to the other? So if you shut the cdaemon2 node, let
>     cdaemon4 become the master for all the queue, turn cdaemon2 back on (it
>     will now be a slave node) does the memory on cdaemon2 now grow?
> 
> 
> Yes, after current master stops and starts it becomes slave and its 
> memory starts to grow. Meantime new selected master memory does not
> become free. So new master memory stops to grow but do not fall back
> to normal. I have attached memory graphs of our nodes to this
> letter.

Thanks for confirming.

>     Have you been able to add a third node to the cluster for testing
>     purposes to see if memory grows on more than one slave node?
> 
> 
> We have not tied to do this yet. But if it can help we can allocate
> one more node. Is is ok to create test node at the same physical host
> as one of existing nodes?

This is probably not necessary. Based on the information you have
provided we have identified a problem which could be the cause.

>     How long does it take for the memory use to reach the VM memory high
>     watermark? 
> 
> 
> Critical point for our cluster comes much more earlier than VM
> memory high watermark. The same time with memory grow slave node
> starts to use CPU more and more active. In our case when memory
> consumption reaches ~1Gb broker stops to respond.
> 
> After slave restart memory grows linearily some time. After that
> memory grow changes its pattern. At some moments it increases by
> constant step (~20Mb). I have marked these steps on graphs attached.

The linear increase in memory use strongly suggests a memory leak.
Thanks for the detailed information and graphs.

>     Can you describe your messaging pattern in a bit more detail for us to
>     reproduce the problem - how often are new channels created when
>     publishing?

> 2) Create channel

> 3) Create channel

It it likely that the high turnover of channels is a critical
precondition for this leak.

>     In order to investigate further it might be helpful to execute some
>     diagnostic commands on the broker. Are you able to replicate the problem
>     in a staging or QA environment where it is safe to do this?
> 
> 
> I will execute diagnostic commands on the broker. If something goes 
> wrong our messaging falls back to version without rabbitMQ involved
> ).

Please be aware that this command could produce a large amount of
output. It should be run on the slave node:

rabbitmqctl eval 'hd(lists:reverse(lists:sort([{process_info(Pid,
memory), Pid, process_info(Pid)} || Pid <- processes()]))).'

Please pipe the output to a file, compress and email to me offlist.

Thanks again

-Emile