Hi, Tim:<div><br></div><div>I've got the following going on (version 2.8.4):<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<div><br></div><div>- My consumers all stop (e.g. imagine a failure scenario / upgrade), but producers keep on producing</div>
<div>- Queues start backing up </div><div>- Memory increases with queue size</div><div>- The high water mark gets hit and the node memory alarm goes off</div></div></blockquote><div><br></div><div>So far, that's all what's supposed to happen. The idea is that if the broker has a lot of messages stacking up in memory then, regardless of whether you asked for them to be durable or not, it will move them to the disk to free up RAM and avoid Erlang VM GC or allocation failure disasters that might occur due to RAM exhaustion.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>- with this being a durable queue, I anticipated RMQ would flush to disk and free memory.</div>
</div></blockquote><div><br></div><div>One thing to note: If the broker is under severe memory pressure, the pushing of messages to disk will happen regardless of the queue's durability status (also, recall that the description *durable queue* just means that the queue's definition will survive a broker restart, it doesn't by itself guarantee anything about the queue's *contained messages*) or whether you published the messages with the persistent delivery mode set.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div> Could someone please explain the memory overhead for messages sitting on a queue? </div>
</div></blockquote><div><br></div><div>The body of the message itself, plus some bookkeeping overhead the broker uses to keep track of it.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div> I guess there is a something in memory for each message on a queue - is there a way to work around that? (we anticipate deliberately getting into this state from time to time, when we e.g. upgrade HBase)</div>
</div></blockquote><div><br></div><div>Yes, the message itself will be in memory unless it's swapped out. Indeed, even if the message is swapped out due to memory pressure there's a tiny bit of overhead corresponding to it that lurks in the Erlang Term Service store that Rabbit uses... in rare cases this latter overhead can cause grief on its own, if things are allowed to stay out of balance too long. </div>
<div><br></div><div>As for something being in memory for each message on a queue, modulo the ETS bit that you can't do anything about, the ways to work around this are to:</div><div><ul><li>Let the broker swap messages out to disk in order to get below the configured memory watermark, at which point the TCP back pressure that will be stiff arming publishers will relent and they'll start publishing again</li>
<li>Catch up on consuming your messages, perhaps by starting more consumers in response to the demand</li></ul></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<div>- I'm kind of in a deadlock I think now as when the consumers start, they won't ack a message until they have successfully sent a message on (it's a multihop process) but that is blocked. Should the per connection flow control not have kicked in and blocked the producers before the whole lot just blocked? (have I missed some setting to enable that, as the docs say it is on by default).</div>
</div></blockquote><div><br></div><div>Are these consumers doing something else with the message before republishing it, say taking some real world action or doing something to a database elsewhere? If your app design has a case where publishes could be blocked (say due to over eager producers, or failure or slowdown of consumers) you might consider doing something like making your routing fabric a bit richer so that virtual copies of the message might move around in the broker without an explicit consume/do-stuff/ack/re-publish cycle which, as you point out, can get jammed up if the re-publishes are being held. That said, modulo the rare ETS catastrophe, the broker's default swapping mechanism should catch up and obviate some of the memory pressure that's causing the trouble.</div>
<div><br></div><div>Does that make sense?</div><div><br></div><div>BTW, there's a nice description of what various entities in Rabbit cost in the Manning book "RabbitMQ in Action," in Chapter 11, IIRC. Giving that a read will be very helpful for building intuition on what happens where, what it costs, etc...</div>
<div><br></div><div>Best regards,</div><div>Jerry</div><div><br></div></div></div>