[rabbitmq-discuss] Huge queues - memory alarm going off

Sun Mar 10 12:14:04 GMT 2013

Thanks for taking the time for the detailed response Jerry,

Regarding the deadlock it makes good sense to rethink that scenario along
the lines that you suggest.

I've read the chapter 11 having downloaded the ebook version, and if I
understand correctly then, it is really not good practice to deliberately
use RMQ as a large buffering technology for queues, due to the memory
management.  Namely, that if one queue is hugely backed up, we'll get into
a oscillation of:
  i) memory limit hit
  ii) block everything while flush partially to disk
  iii) repeat immediately (while the disabled consumer remains)
While it will work, we'll likely cripple other parts of the system if they
are going through the same broker.

I think this was probably a lack of understanding on our part, as we
anticipated using it as a queue (to do large buffering) whereas I presume
it is (?) really intended to be a messaging system and targeting zero queue
sizes is the expected behavior (consumer throughput matched to producer).

Are there alternative configurations that you are aware of that would allow
it to back up large queues, without hitting memory limits?  (the tokyo
cabinet plugin perhaps?)

We can work around this in our design of course by either swapping out to
another messaging/queuing system completely, or having consumers that pull
from Rabbit and then back up a buffer in another queue (perhaps Kafka or
similar).

Thanks,
Tim

On Sat, Mar 9, 2013 at 10:33 PM, Jerry Kuch <jerryk at rbcon.com> wrote:

> Hi, Tim:
>
> I've got the following going on (version 2.8.4):
>
>>
>> - My consumers all stop (e.g. imagine a failure scenario / upgrade), but
>> producers keep on producing
>> - Queues start backing up
>> - Memory increases with queue size
>> - The high water mark gets hit and the node memory alarm goes off
>>
>
> So far, that's all what's supposed to happen.  The idea is that if the
> broker has a lot of messages stacking up in memory then, regardless of
> whether you asked for them to be durable or not, it will move them to the
> disk to free up RAM and avoid Erlang VM GC or allocation failure disasters
> that might occur due to RAM exhaustion.
>
>
>> - with this being a durable queue, I anticipated RMQ would flush to disk
>> and free memory.
>>
>
> One thing to note:  If the broker is under severe memory pressure, the
> pushing of messages to disk will happen regardless of the queue's
> durability status (also, recall that the description *durable queue* just
> means that the queue's definition will survive a broker restart, it doesn't
> by itself guarantee anything about the queue's *contained messages*) or
> whether you published the messages with the persistent delivery mode set.
>
>
>>  Could someone please explain the memory overhead for messages sitting on
>> a queue?
>>
>
> The body of the message itself, plus some bookkeeping overhead the broker
> uses to keep track of it.
>
>
>>  I guess there is a something in memory for each message on a queue - is
>> there a way to work around that? (we anticipate deliberately getting into
>> this state from time to time, when we e.g. upgrade HBase)
>>
>
> Yes, the message itself will be in memory unless it's swapped out.
>  Indeed, even if the message is swapped out due to memory pressure there's
> a tiny bit of overhead corresponding to it that lurks in the Erlang Term
> Service store that Rabbit uses...  in rare cases this latter overhead can
> cause grief on its own, if things are allowed to stay out of balance too
> long.
>
> As for something being in memory for each message on a queue, modulo the
> ETS bit that you can't do anything about, the ways to work around this are
> to:
>
>    - Let the broker swap messages out to disk in order to get below the
>    configured memory watermark, at which point the TCP back pressure that will
>    be stiff arming publishers will relent and they'll start publishing again
>    - Catch up on consuming your messages, perhaps by starting more
>    consumers in response to the demand
>
> - I'm kind of in a deadlock I think now as when the consumers start, they
>> won't ack a message until they have successfully sent a message on (it's a
>> multihop process) but that is blocked.  Should the per connection flow
>> control not have kicked in and blocked the producers before the whole lot
>> just blocked?  (have I missed some setting to enable that, as the docs say
>> it is on by default).
>>
>
> Are these consumers doing something else with the message before
> republishing it, say taking some real world action or doing something to a
> database elsewhere?  If your app design has a case where publishes could be
> blocked (say due to over eager producers, or failure or slowdown of
> consumers) you might consider doing something like making your routing
> fabric a bit richer so that virtual copies of the message might move around
> in the broker without an explicit consume/do-stuff/ack/re-publish cycle
> which, as you point out, can get jammed up if the re-publishes are being
> held.  That said, modulo the rare ETS catastrophe, the broker's default
> swapping mechanism should catch up and obviate some of the memory pressure
> that's causing the trouble.
>
> Does that make sense?
>
> BTW, there's a nice description of what various entities in Rabbit cost in
> the Manning book "RabbitMQ in Action," in Chapter 11, IIRC.  Giving that a
> read will be very helpful for building intuition on what happens where,
> what it costs, etc...
>
> Best regards,
> Jerry
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130310/6a16e03d/attachment.htm>