[rabbitmq-discuss] question about cluster failover

Tue Nov 4 08:54:44 GMT 2008

On Mon, Nov 3, 2008 at 11:42 PM, Ben Hood <0x6e6562 at gmail.com> wrote:
..
>
>> Assume I have many publishers messaging with many consumers at some
>> constant rate for each publisher. Will adding hosts to the cluster
>> increase the capacity of the system linearly in terms of max number of
>> consumers and producers? Can you explain?
>
> I am not aware of any empirical evidence about Rabbit that could
> support such a claim.
>
> However, by clustering, you can spread the load of ingress routing. On
> egress, queues are not load balanced, meaning that queue entities
> reside on one particular node, so in a pathological scenario, where
> the entire message load is routed to one particular queue, clustering
> will balance not balance this out. Wisely chosen routing keys or a
> custom exchange may be able to solve this.

Just to add some colour to what Ben said...

Re: linear scaling of capacity.

Based on our experiences to date, RabbitMQ scales *capacity* close to
linearly by adding more nodes to the cluster.  What happens is that
the nodes in the cluster share the routing table ("Exchange" in AMQP)
so that you can add as many producers as you like.  The nodes will
route messages on to the relevant queue.

As Ben points out this means that - for example - if you had only one
(shared) queue, then that would be a bottleneck because no matter
which node the messages arrived at, they would still get routed on to
the one node with that queue.  (Note that each queue is physically
bound to one node and not transactionally replicated to a back-up
node).   So, on the consumer side, the way to scale is to partition
your message flows into multiple queues.

In the last few months several people have described use cases which
suffer from being pathological in the sense Ben described.  These
folks have at least one shared queue that is so big that it may as
well be the only queue.  It is a place to which everything and
anything gets dumped, usually for logging or spam detection or
similar.  The way to deal with this case is to overflow the queue to a
second location - either another node or to disk.  Most people prefer
to see a page-to-disk solution in this case, because in effect the
queue is an archive.

alexis