[rabbitmq-discuss] Possible memory leak in the management plugin

Wed Apr 9 16:51:25 BST 2014

Hi Simon,

Thank you for a quick and detailed answer! Getting to the root of this issue
is very important to us.

> if you want to re-enable mgmt you might want to add 
> {rabbitmq_management_agent, [{force_fine_statistics, false}]} 
> in your configuration

We definitely want to have mgmt plugin running and we did re-enable it after
implementing some work on reducing number of channels/exchanges a bit and
increasing collect_statistics_interval as it proved to be helpful in the lab
tests. We will consider turning off force_fine_statistics if the issue
happens again instead turning off entire plugin.

> So are they growing without bound? What happens when you close all the
> channels? 

It's hard to say how far they will grow, but these three (aggregated_stats,
aggregated_stats_index, old_stats) definitely only go up during the test.
See below for more info. 
Killing publishing channels does bring these three tables and mgmt_db memory
back to normal size (~6Mb) after some short time. So the issue definitely
requires continuous activity on channels producing massive stats events to
happen.

>What does 
>rabbitmqctl eval
'erlang:garbage_collect(global:whereis_name(rabbit_mgmt_db)).' 
>do to your memory use? 

I've repeated the test described earlier while running 
/usr/sbin/rabbitmqctl eval
'{process_info(global:whereis_name(rabbit_mgmt_db),memory),[{T,
ets:info(T,size), ets:info(T,memory)} || T <-lists:sort(ets:all()),
rabbit_mgmt_db <- [ets:info(T, name)]]}.'
every second. Process memory (and sizes of those aggregated/old stats data
structures) went quickly up to 2069410400 and then stopped. After about 30
seconds I checked rabbitmqctl status and it reported that mgmt_db was using
3795297904! 

[root at lab-rmq02 pmaisenovich]# /usr/sbin/rabbitmqctl status | grep mgmt_db
      {mgmt_db,3795297904},
[root at lab-rmq02 pmaisenovich]# /usr/sbin/rabbitmqctl eval
'{process_info(global:whereis_name(rabbit_mgmt_db),memory),[{T,
ets:info(T,size), ets:info(T,memory)} || T <-lists:sort(ets:all()),
rabbit_mgmt_db <- [ets:info(T, name)]]}.'
{{memory,2069410496},
 [{5734484,1046,205243},
  {5738585,5,906},
  {5742682,2006,136495},
  {5746779,1,175},
  {5750876,1,175},
  {5754973,1,1059},
  {5759070,895752,48244117},
  {5763167,1777748,44462528},
  {5767264,1777799,124621170}]}
...done.

Clearly we have some memory missing: ETS tables report ~225Mb,
process_info(rabbit_mgmt_db, memory) reports ~2Gb, rabbitmqctl status
reports ~3.8Gb for mgmt_db.

Q5: What else is included in mgmt_db size when reported by rabbitmqctl
status? 

Running garbage collection
(erlang:garbage_collect(global:whereis_name(rabbit_mgmt_db))) did instantly
reduce the mgmt_db size:

[root at lab-rmq02 pmaisenovich]# /usr/sbin/rabbitmqctl status | grep mgmt_db
      {mgmt_db,3853521792},
[root at lab-rmq02 pmaisenovich]# /usr/sbin/rabbitmqctl eval
'erlang:garbage_collect(global:whereis_name(rabbit_mgmt_db)).'
true
...done.
[root at lab-rmq02 pmaisenovich]# /usr/sbin/rabbitmqctl status | grep mgmt_db
      {mgmt_db,1804503848},

And immediately cleaned up rabbit_mgmt_db memory (even too much so):

{{memory,2069410400},
 [{5734484,1046,205243},
  {5738585,5,906},
  {5742682,2006,136495},
  {5746779,1,175},
  {5750876,1,175},
  {5754973,1,1059},
  {5759070,916685,51561447},
  {5763167,1819614,45509178},
  {5767264,1819847,127559980}]}
...done.
{{memory,5960},
 [{5734484,1046,205243},
  {5738585,5,906},
  {5742682,2006,136495},
  {5746779,1,175},
  {5750876,1,175},
  {5754973,1,1059},
  {5759070,916685,51561447},
  {5763167,1819614,45509178},
  {5767264,1819847,127559980}]}
...done.

Q6: In the last snapshot above process_info(rabbit_mgmt_db, memory) is much
smaller than ETS numbers right below it. Are those not included in the
process memory calculation?
Q7: Note, that ETS table sizes didn't go down at all. Isn't GC supposed to
clean those up?

Furthermore, in 4 seconds after manual GC run, the
process_info(rabbit_mgmt_db, memory) went up from 5960 to 783979640 and kept
growing up to a certain (different to previous) limit.

Here is a full log of process memory snapshots during this test (including
drop after GC run):
https://gist.github.com/maisenovich/10283607

Note, that I ran GC a couple more times during the test, with the same
effect - memory would go down briefly and quickly ramp back up.

Finally, "rabbitmqctl status" output for Rabbit used in the test above
(while idle, after publishing channels were terminated) just in case:
https://gist.github.com/maisenovich/10284388

Thanks!

--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Possible-memory-leak-in-the-management-plugin-tp27414p34697.html
Sent from the RabbitMQ mailing list archive at Nabble.com.