[rabbitmq-discuss] Possible memory leak in the management plugin

Thu Apr 10 11:06:49 BST 2014

On 10/04/14 06:03, Pavel wrote:
> Thanks for answers, Simon! Now everything makes much more sense!
>
> Here is the program I used (requires com.rabbitmq:amqp-client:3.3.0 in
> classpath):
> https://gist.github.com/maisenovich/10339925

Thank you!

> As you suggested, with {force_fine_statistics,false} Rabbit was able
> to survive the same test, stabilizing around 2Gb mark with most RAM taken by
> connection_stats and queue_procs. So it is certainly a working option,
> although at the cost of reduced visibility.

That's good to hear.

> Then I added a modification to the test - every time message is published to
> a random exchange (instead of publishing 1000 messages to each exchange
> sequentially). This produced an interesting result: rabbit_mgmt_db process
> was GCed properly and remained at stable memory use. However
> aggregated_stats table kept growing in size until all RAM (up to a high
> watermark) was taken.
>
> For example, {process_info(rabbit_mgmt_db, memory),ETS sizes} went from
>
> {{memory,542484424},
>   [{5730386,1046,205243},
>    {5734488,2005,384904},
>    {5738585,2006,126545},
>    {5742682,1,175},
>    {5746779,1,175},
>    {5750876,1,1059},
>    {5754973,1009026,79415774},
>    {5759070,2001250,49359758},
>    {5763167,1003620,57435178}]}
>
> at one point to this some time later:
>
> {{memory,434953160},
>   [{5730386,1046,205243},
>    {5734488,2005,384904},
>    {5738585,2006,132875},
>    {5742682,1,175},
>    {5746779,1,175},
>    {5750876,1,1059},
>    {5754973,1009026,147761213},
>    {5759070,2001250,49359758},
>    {5763167,1003834,57452431}]}
>
> Note, that number of items for (5754973) didn't change, but the size almost
> doubled.

This is not too surprising. That table contains one row for every 
combination of things that can show message rates, and each row contains 
some history for that thing.

> I was trying to understand "Event-GCing" portion of
> rabbit_mgmt_stats.erl
> <https://github.com/rabbitmq/rabbitmq-management/blob/master/src/rabbit_mgmt_stats.erl>
> , but that's beyond me at the moment. Could you please describe in a few
> words, how and when aggregated_stats supposed to be cleaned up?

The GCing is about deleting old history from each row. This is a 
relatively expensive operation, so the DB loops round GCing 1% of rows 
(or 100 rows, whichever is larger) every 5s. That means that we can keep 
a bit more history around than we're configured to, just because we 
haven't got round to GCing it yet.

(Probably this is obvious but: this "GC" has nothing to do with the 
Erlang VM GC we were previously discussing.)

> Also, from your earlier reply it sounds like closing and reopening channels
> (sometimes) would help to keep mgmt_db from expanding. Is this something
> generally recommended to do (short-live vs long-live channels)?

That will short-cut the history GCing above, since it will just drop all 
the rows relating to that channel. You might possibly decide to do that 
to work round this performance issue, but it's certainly not "best 
practice" - you should be able to have long-lived channels!

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, Pivotal