[rabbitmq-discuss] Huge queues - memory alarm going off

Sat Mar 9 21:33:56 GMT 2013

Hi, Tim:

I've got the following going on (version 2.8.4):

>
> - My consumers all stop (e.g. imagine a failure scenario / upgrade), but
> producers keep on producing
> - Queues start backing up
> - Memory increases with queue size
> - The high water mark gets hit and the node memory alarm goes off
>

So far, that's all what's supposed to happen.  The idea is that if the
broker has a lot of messages stacking up in memory then, regardless of
whether you asked for them to be durable or not, it will move them to the
disk to free up RAM and avoid Erlang VM GC or allocation failure disasters
that might occur due to RAM exhaustion.

> - with this being a durable queue, I anticipated RMQ would flush to disk
> and free memory.
>

One thing to note:  If the broker is under severe memory pressure, the
pushing of messages to disk will happen regardless of the queue's
durability status (also, recall that the description *durable queue* just
means that the queue's definition will survive a broker restart, it doesn't
by itself guarantee anything about the queue's *contained messages*) or
whether you published the messages with the persistent delivery mode set.

>  Could someone please explain the memory overhead for messages sitting on
> a queue?
>

The body of the message itself, plus some bookkeeping overhead the broker
uses to keep track of it.

>  I guess there is a something in memory for each message on a queue - is
> there a way to work around that? (we anticipate deliberately getting into
> this state from time to time, when we e.g. upgrade HBase)
>

Yes, the message itself will be in memory unless it's swapped out.  Indeed,
even if the message is swapped out due to memory pressure there's a tiny
bit of overhead corresponding to it that lurks in the Erlang Term Service
store that Rabbit uses...  in rare cases this latter overhead can cause
grief on its own, if things are allowed to stay out of balance too long.

As for something being in memory for each message on a queue, modulo the
ETS bit that you can't do anything about, the ways to work around this are
to:

   - Let the broker swap messages out to disk in order to get below the
   configured memory watermark, at which point the TCP back pressure that will
   be stiff arming publishers will relent and they'll start publishing again
   - Catch up on consuming your messages, perhaps by starting more
   consumers in response to the demand

- I'm kind of in a deadlock I think now as when the consumers start, they
> won't ack a message until they have successfully sent a message on (it's a
> multihop process) but that is blocked.  Should the per connection flow
> control not have kicked in and blocked the producers before the whole lot
> just blocked?  (have I missed some setting to enable that, as the docs say
> it is on by default).
>

Are these consumers doing something else with the message before
republishing it, say taking some real world action or doing something to a
database elsewhere?  If your app design has a case where publishes could be
blocked (say due to over eager producers, or failure or slowdown of
consumers) you might consider doing something like making your routing
fabric a bit richer so that virtual copies of the message might move around
in the broker without an explicit consume/do-stuff/ack/re-publish cycle
which, as you point out, can get jammed up if the re-publishes are being
held.  That said, modulo the rare ETS catastrophe, the broker's default
swapping mechanism should catch up and obviate some of the memory pressure
that's causing the trouble.

Does that make sense?

BTW, there's a nice description of what various entities in Rabbit cost in
the Manning book "RabbitMQ in Action," in Chapter 11, IIRC.  Giving that a
read will be very helpful for building intuition on what happens where,
what it costs, etc...

Best regards,
Jerry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130309/63734e12/attachment.htm>