[rabbitmq-discuss] RabbitMQ crashes hard when it runs out of memory

Fri Oct 23 01:47:50 BST 2009

I won't bore you with all the output, but I tracked down the binary usage to
these two processes:

[{Pid1, _Info, _Bin}, {Pid2, _Info2, _Bin2} | Other ] = [{P,
process_info(P), BinInfo} || {P, {binary, BinInfo}} <- [{P, process_info(P,
binary)} || P <- processes()], length(BinInfo) > 100000].

<0.157.0>             gen:init_it/6                      1682835  1873131
0
                      gen_server2:process_next_msg/8
13
<0.158.0>             rabbit_persister:init/1           19590700 29789397
0
rabbit_persister      gen_server2:process_next_msg/8
13

I tried your suggestion to free memory and check, but it looks most was held
up in the persister:

35> M = [{erlang:garbage_collect(P), memory(total)} || P <-
erlang:processes()].

51> [{P,Mem} || {Mem, P} <- lists:zip( [Me||{true, Me} <- M], processes()),
Mem < 842757048].
[{<0.148.0>,842753448},
 {<0.149.0>,842700248},
 {<0.150.0>,842700248},
 {<0.151.0>,842700248},
 {<0.152.0>,842697224},
 {<0.154.0>,842697792},
 {<0.155.0>,842724104},
 {<0.156.0>,842712824},
 {<0.157.0>,825951032},
 {<0.158.0>,602886872},
 {<0.159.0>,345002144},
 {<0.177.0>,345002144},
 {<0.178.0>,345002144},
 {<0.179.0>,345002144},
 {<0.180.0>,345002144},
 {<0.181.0>,345002144},
 {<0.182.0>,345002144},
 {<0.183.0>,345002144},
 {<0.184.0>,345002144},
 {<0.245.0>,345000624},
 {<0.247.0>,345001520},
 {<0.248.0>,344996984},
 {<0.249.0>,344995464},
 {<0.250.0>,344995512},
 {<0.252.0>,344996416},
 {<0.253.0>,344991880},
 {<0.254.0>,344991928},
 {<0.261.0>,...},
 {...}|...]

So it looks like the large chunks are held up between between gen_server2
and rabbit_persister.

_steve

On Thu, Oct 22, 2009 at 4:24 PM, Matthias Radestock <matthias at lshift.net>wrote:

> Stephen,
>
> Stephen Day wrote:
>
>> (rabbit at vs-dfw-ctl11)5> [erlang:garbage_collect(P) || P <-
>> erlang:processes()].
>> [true,true,true,true,true,true,true,true,true,true,true,
>>  true,true,true,true,true,true,true,true,true,true,true,true,
>>  true,true,true,true,true,true|...]
>>
>> (rabbit at vs-dfw-ctl11)6> memory().
>>     [{total,145833144},
>>  {processes,50900752},
>>  {processes_used,50896864},
>>  {system,94932392},
>>  {atom,514765},
>>  {atom_used,488348},
>>  {binary,24622512},
>>  {code,3880064},
>>  {ets,64745716}]
>>
>> This really cut down on usage, so its likely that the binary gc is falling
>> behind rabbits requirements.
>>
>
> Agreed.
>
>  How do I track down the uncollected binary heap usage to a process?
>>
>
> Binaries are shared between processes and ref counted, so no single process
> owns them. There is a process_info item called 'binary' that provides
> information on the binaries referenced by a process, but I've never looked
> at that myself, so don't know how useful the contained info is.
>
> One thing you could try is to run the above garbage_collect code
> interleaved with the memory reporting code to identify which process results
> in the biggest drop in memory memory usage when gc'ed.
>
>
> Regards,
>
> Matthias.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20091022/70f3fc7f/attachment.htm