[rabbitmq-discuss] exessive memory usage and blocked connection, even when no messages left

Thu May 19 11:00:01 BST 2011

Ok, thanks for the clarification, it makes sense, and it's more or
less what we figured. I was hoping there was another explanation, that
would be fixable.

What kind of performance should we expect when running on EC2? It
feels like 20K messages per second shouldn't be a problem for a four
node cluster where each node has 8 CPUs and 7 Gb RAM. to get any kind
of performance and stability we have had to tune how we work with the
cluster, making sure that queues are distributed over the nodes, that
each node has more than one queue, that the connections from the
producers and consumers are evenly distributed over the nodes, etc.
even slight asymmetries show up quite quickly in the memory usage of a
node, and as soon as that happens it's only minutes before the whole
cluster goes bad. Topic exchanges seem to be completely out of the
question, it only takes minutes before every node gets overloaded.

With a direct exchange and extreme attention to symmetry we have
achieved a sustained publish/deliver/ack load of 40K/s for as long as
we ran the test, and we can probably live with that. The idea was to
use a topic exchange, but at this point it seems like that is
completely out of the question, but we can probably move that
functionality into the application (just like we had to move load
balancing into the application because queues are bound to CPUs).

Theo

On May 19, 11:28 am, Matthias Radestock <matth... at rabbitmq.com> wrote:
> Theo,
>
> On 19/05/11 06:35, Theo wrote:
>
> > I rewrote my test code in Java, just to make sure there's nothing with
> > the Ruby driver that is causing the problem, here is the code:
> >https://gist.github.com/980238
>
> > I ran that with more or less the same result. After a few minutes at
> > 20K publishes/deliveries/acks per second the cluster rotted and all
> > connections got blocked. What's more, this time I saw that if I turned
> > off my producers the web UI still reported messages being published
> > for several minutes -- no producers were running (I'm absolutely sure
> > the processes were terminated) but there were still messages being
> > published.
>
> The figure you see reported in the management UI is the rate at which
> rabbit processes these messages. The publisher code publishes messages
> as fast as it can, which is a far higher rate than rabbit can process
> them. Hence rabbit is buffering the messages in order to try to keep up.
> That way rabbit can handle brief periods of high publishing rate.
>
> However, if publishing continues at a high rate then eventually the
> buffers fill up all memory. At that point rabbit blocks further
> publishes until the buffers have been drained sufficiently to allow
> publishing to resume. This can take a while since rabbit has to page the
> messages to disk in order to free up space.
>
> Moreover, as soon as the publishers are unblocked, if they resume
> publishing at the high rate, as is the case in your test, the memory
> limit will be hit again quickly. Hence you will see rabbit "bouncing off
> the limiter", where publishers get unblocked briefly and then blocked again.
>
> This is all quite normal.
>
> Regards,
>
> Matthias.
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss