[rabbitmq-discuss] memory usage reporting

Wed Apr 24 23:24:34 BST 2013

Kyle,

On 24/04/13 22:43, Kyle O'Donnell wrote:
> I don't know if there is anything confidential in the report output.

Depends on the app. The report doesn't contain any message
headers/payloads, but it does have queue and exchange names, bindings,
connection IPs, etc.

> Is there something specific I can send you (the output is ~10k
> lines)?

Well, I'm trying to get a sense of what the app is doing, so no.

One other thing you could do is watch the management UI overview page in 
both the R14 and R16B case and note any obvious differences in the 
various counts, rates and stats reported.

> 1) erlang might be looking in the wrong place to get its memory usage
> information

I consider that highly unlikely. If the 'hole' was in system memory then
yes, but it's in the binary segment.

> 2) something is causing erlang to reserve large chunks of virtual
> memory.

That something would be rabbit buffering large (or large quantities of) 
messages ;)

> notice that the amount of memory allocated to 'binary' is almost
> nil, whereas with R15/R16 it was huge..

Yes, that segment of memory is nearly exclusively occupied by message 
payloads in rabbit.

There are plenty of other differences in the report though. Notice in 
particular the difference in socket count, which I pointed out before, 
and memory associated with connections.

OTOH, we shouldn't read too much into these reports since we are in both 
cases looking at a system in two quite different *final* states: on R14A 
the app has finished and rabbit is idle, on R16B rabbit has hit the 
memory high watermark and blocked producers. The interest question is 
how two systems with notionally identical initial states ended up in 
such radically different final states.

As I tried to explain, very small differences in behaviour could be 
massively amplified. Consider what happens if you have a simple 
producer/consumer app, and have tuned the consumer s.t. it *just* keeps 
up with the producer. A small disturbance, such as a minor change in 
scheduling can result in the consumer falling behind. At which point 
messages will start to build up in rabbit, memory becomes scarce, gc 
costs increase, messages start getting paged to disk (which is 
expensive), and eventually the memory watermark is hit - briefly to 
start with, more frequently later, and then permanently - and producers 
are blocked.

You did not answer my question as to whether the app is running on the 
same machine as the broker. If the latter, that could make a massive 
difference since recent Erlangs have very different (and generally 
improved) core utilisation. So if your app is sharing cores with rabbit 
then it may well get fewer cycles on R16B than R14, and if that affects 
consumers more than producers then the aforementioned amplification 
could happen.

Regards,

Matthias.