[rabbitmq-discuss] High publish rates, mirrored queues, high watermark...

Simon MacMullen simon at rabbitmq.com
Thu Jan 12 17:23:26 GMT 2012


On 12/01/12 16:41, Jim Myhrberg wrote:
> Hi, I was hoping you guys would be able to answer some quick questions
> about an issue we're having.

Hi Jim.

> We've recently started using mirrored queues across two RabbitMQ
> instances on two servers, and it's working really well except for when
> publish rates are really high for a longer period of time. At that point
> RabbitMQ starts eating a lot of memory on the master node until it hits
> our high watermark limit of 18.9GB, at which point things basically stop
> working.
>
> My theory of what is happening is that when RabbitMQ can't process
> incoming publish requests fast enough, it caches the raw TCP packets in
> memory until it can deal with them and direct it to the queue(s) in
> question.

This is approximately true. In fact RabbitMQ can be viewed as a 
pipeline, when publishing over AMQP it would look like:

(OS TCP buffer) -> [reader] -> [channel] -> [queue] -> etc...

Each of the square bracketed things is an Erlang process. The reader 
(aka connection) disassembles packets into AMQP methods (most frequently 
basic.publish). The channel applies those methods (in the case of basic 
publish by making routing decisions, checking security, etc). And then 
there's the queue.

The Erlang processes communicate by message passing - so each process 
has a mailbox which contains messages which have been sent on by the 
previous stage but not yet handled.

The OS manages the size of the TCP buffer and in general it won't be 
able to contain *too* huge a number of messages, but the process 
mailboxes can grow arbitrarily large when you publish "too fast" for the 
queue to keep up. And that's where most of your memory is going I'll bet.

The theory was that more and more memory will get used up by the process 
mailboxes and eventually the memory alarm will go off, which will cause 
all the readers to block while Rabbit sorts itself out. However, by this 
stage there can be a huge amount of work to do (ironically the more 
memory you have the worse off you are), and everything grinds to a halt 
for potentially a very long time.

The good news is that we're working (right now, by lucky coincidence) on 
a system of internal flow control that will prevent the process 
mailboxes from becoming too big. This means that a publisher which is 
publishing "too fast" for a queue to handle will get pushed back on much 
more rapidly (within a second or so rather than after memory fills up).

This feature is likely to be in the next release, but if you can't wait 
until then you can improve matters considerably by using confirms and 
only allowing each publisher to have (e.g.) 1000 unconfirmed messages. 
You won't get a confirm back until the message has hit the queue, so you 
bound the number of in-flight messages.

Arguably you should be doing that anyway - if your messages are 
important enough that they need to go to mirrored queues they are 
probably important enough that the publisher needs confirmation that 
they've been accepted by the broker.

> And I'm assuming the fact that we are using mirrored queues
> adds overhead in dealing with a publish request as it needs to be synced
> to the other node(s). Am I right?

Yes, very much so. Mirrored queues are noticeably slower than 
non-mirrored ones due to the extra work involved. And also they're newer 
and not as heavily optimised.

> My theory is based on the fact that we could deal with much higher
> publish rates without and problems before we switched to mirrored
> queues. And the fact that we've had a few points when memory usage has
> been excessive we've seen queues that were completely empty according to
> the management plugin, but our workers were still processing +1000
> messages/sec for a good hour, all the while the management plugin said
> no incoming messages, and +1000 msg/s get/acks.

Yes, that's exactly consistent with messages being backed up in process 
mailboxes.

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, VMware


More information about the rabbitmq-discuss mailing list