[rabbitmq-discuss] Cluster Memory Usage

Mon Nov 21 23:09:59 GMT 2011

To help out maybe I can give you the exact topology of our rabbit instances.

We have 6 RabbitMQ servers in remote servers. Then we have 2 local
RabbitMQ servers local to a cluster of 2 more RabbitMQ servers. Each
of these servers is on its own box.

The 8 RabbitMQ standalone servers are using the shovel plugin to send
data across the network to the Cluster of 2 RabbitMQ servers. The
shovel is set to prefetch 1000 messages and as you said is just
forwarding the acks from the cluster as confirms. Each one of these 8
RabbitMQs has threads that are writing messages sometimes 100 at a
time sometimes 1 at a time. The channel is set to confirm so they wait
for confirms from the 8 RabbitMQs.

The shovel seems to only be able to send about 100/200 messages a
second to the cluster and so we see it back up considerably whenever
messages are produced at a higher rate than this. Not all of the 8
servers produce messages higher than 100/s but the two that are local
in the same datacenter often will.

Like I said earlier the cluster is x-ha policy set to "all" so that
the messages are mirrored from the master to the slave. Then there are
about 40 consumer threads consuming from the cluster. Each of these
threads opens 1 connection, and then creates 1 channel. They are set
to prefetch 1000 messages and then send acks in batches up to 100
(depending on if there was 100 messages to be read within a 500ms
timeout).

Maybe this will help you replicate it?

On Mon, Nov 21, 2011 at 4:50 PM, Matthias Radestock
<matthias at rabbitmq.com> wrote:
> Travis,
>
> On 21/11/11 22:16, Travis wrote:
>>
>> Yes.  We've seen this problem when there are only a few dozen messages
>> and when there are hundreds of thousands.
>>
>> Basically, it seems like the slave is always taking up 2-5x the amount
>> of memory that the master uses.
>
> In my tests the slave uses a bit less memory than the master, which is what
> I would expect in a relatively lightly loaded system when there are no
> connections to the slave node.
>
>> Looking at the two servers, the slave shows that it's beam.smp has
>> spent 12 more hours on cpu than the beam.smp on the master.  This
>> doesn't make sense to me if the slave is doing less work because it's
>> only handling traffic coming from the master.
>
> That's odd indeed. In my test I see the slave getting by with less than half
> the CPU utilisation of the master.
>
> Do you see this memory and CPU pattern - with the slave using more than the
> master - all the time, including shortly after a restart?
>
> I still haven't got a good trail to follow here. One thing to look at is the
> output of 'rabbitmqctl report' when the system is in the "excessive memory
> use" state. Please send us that.
>
> Regards,
>
> Matthias.
>