[rabbitmq-discuss] questions about RabbitMQ linear scalability

Thu Aug 29 16:43:19 BST 2013

Something we did for high latency systems was to publish to a fanout
exchange that's bound to an x-concurrent-hash exchange.  Then we multiplex
the queues, so instead of "myqueue" we create 10 queues "myqueue1"...
"myqueue10".  As long as ordering isn't an issue, this allows you to get a
greatly increased through-put.  But then you have to consume from multiple
queues not a single queue.  An example of where we use this technique is
when dealing with physical location differences (e.g. Chicago to Dallas
would be an example) where latency is fairly high.  We use shovels (btw,
remember to set a prefetch OTHER than 0 - we hit major performance issues
otherwise related to network latency) to transmit from each of the 10
queues back to a single exchange/queue on the remote side.

Jason

On Wed, Aug 28, 2013 at 4:57 AM, Tim Watson <tim at rabbitmq.com> wrote:

> Junius,
>
> On 23 Aug 2013, at 03:14, Junius Wang wrote:
> > Before every test ( after a new RabbitMQ node added/removed to the
> cluster):
> > 1).we reset the policy to "exactly" on 2 nodes. 2). declared 10 queues
> using
> > rabbitmqctl and force them distributed evenly on all nodes.  e.g.in a 2
> node
> > cluster,5 queues resides on rabbit1, another 5 resides on rabbit2. 3-3-4
> for
> > 3 node cluster ,3-3-2-2 for 4 node cluster.   3) declare a 'topic"
> exchange
> > 4) binding the 10 queues to the exchange with 10 different routing keys.
> > Publishers publish messages with the 10 routing keys by turns. Thus all
> > queues receive the same number of messages.
> > Is the number of queues large enough?  we can declare more queues in test
> > but I don't think there will be too many queues in production.
> >
>
> Let's re-visit the point Michael made about adding more queues:
>
> "the way you extend capacity with RabbitMQ is by adding nodes and using
> more queues"
>
> In any messaging system where queues exhibit FIFO behaviour, each queue is
> a concurrency bottleneck, because it can handle only 1 message at a time.
> If the queue handled more than 1 inbound message at a time, it would not be
> possible to enforce the FIFO characteristics, since without some kind of
> concurrency control the ordering would be non-deterministic. The question
> of whether or not the number of queues is "large enough" is application
> specific. Where you need FIFO ordering, you'll want a single queue. Where
> you are processing messages separately and/or do not require ordering
> (e.g., between messages of different "types") then you can use separate
> queues: this is a design choice that only you can make. The point Michael
> made is that the more queues you shove messages into, the better the
> concurrency will be, which in turn may yield performance benefits in terms
> of throughput.
>
> As to the point about "adding nodes" - a single RabbitMQ node has a finite
> processing capacity, dependent on available operating system resources and
> client usage patterns. If you work out the maximum capacity "n" of a single
> node in your application, where the combination of available system
> resources and application load on the broker is at its peak, then by adding
> a second node, you might increase the system's overall capacity to
> something close to 2n. There are important caveats here though, as Michael
> has already pointed out: clients can connect to any node in a cluster, but
> if the queue with which they're communicating resides on another node,
> messages (both in and out of the broker) must be transmitted between nodes.
>
> > publishers connect to the cluster via load balancer(AWS ELB) as well as
> > consumers, they don't know to which node they are connect.  So we may not
> > decrease the intra-cluster traffic. But  we can try some high bandwidth
> > instances. Does this help?
>
> Increased bandwidth might be helpful, but the points Michael made will
> still stand. There is always going to be "some" additional overhead due to
> inter-node communications, because network latency is the dominating factor
> here. Think about a classic 2-tier application: the biggest "cost" is
> usually communicating with the backend (e.g., a database or some such), and
> *that* cost is due to the network round trips more often than not.
>
> > Another question is that we will have a queue with lots of messages
> > handled, perhaps 5000/sec(size of 2K), which is about 90% of the total
> > messages handled by the cluster.  It's hard for us to split it into
> multiple
> > queues, it's really a big design change which we try to avoid. From your
> > comments, even if messages are synchronized on only 2 nodes , hardly we
> can
> > get linear scaling , right?
>
> That's correct. The queue must handle messages in FIFO order and has an
> upper bound on how fast it can receive from and/or deliver messages to its
> clients.
>
> > If so, is the network traffic the major factor impacting the
> performance? If so ,we can try some high bandwidth instances
> > or using AWS instance group. Hopes that will get better performance.
>
> Yes and no. If the clients interacting with the queue are connected to
> another node, then the cost of inter-node traffic will have an impact. A
> equally important potential cost you must be aware of though, is that of
> mirroring the queue across even two cluster nodes. When a queue is
> mirrored, the master queue process cannot take full responsibility for a
> message until that message has been accepted by all replicas. Even if the
> mirroring policy is set to "exactly 2 nodes", this still means that a
> message delivered to the queue on node-1 must also be transmitted to
> node-2. If the message is 'persistent', then both nodes must flush to disk.
> If confirms are in use, no confirm can be issued until both nodes have
> received the message, flushed it to disk and the master *knows* that this
> has taken place. That's a lot of work per-message.
>
> So as Michael said, HA/Mirroring is not a feature that helps performance -
> it is a feature that increases resiliency in case of node failures. Adding
> bandwidth is only part of the story. Even with more bandwidth, there is
> still going to be higher latency, not only because of the amount of
> inter-node traffic, but also because each node has to complete some work in
> order to keep the replicas in sync.
>
> I would suggest that you look very hard at that "queue with lots of
> messages handled" and see if you can split it up. Imagine if you were
> writing to a database and you had this one giant table that almost every
> transaction had to write to. This would become a bottleneck quickly and
> you'd look to split it up if possible, or if not then at least to partition
> it based on suitable indexes. We don't have a feature for "partitioning a
> queue over multiple nodes" yet, so you'll have to think about partitioning
> the messages yourself. If you can do that, and therefore increase the
> number of queues that are able to concurrently handle messages, you should
> get a decent performance increase.
>
> HTH.
>
> Tim
>
> >
> > -----Original Message-----
> > From: rabbitmq-discuss-bounces at lists.rabbitmq.com
> > [mailto:rabbitmq-discuss-bounces at lists.rabbitmq.com] On Behalf Of
> Michael
> > Klishin
> > Sent: Thursday, August 22, 2013 4:56 PM
> > To: Discussions about RabbitMQ
> > Subject: Re: [rabbitmq-discuss] questions about RabbitMQ linear
> scalability
> >
> > Junius Wang:
> >
> >> 1.       The throughput of two node cluster is 50%-60% worse than a
> single
> > node broker.
> >
> > With mirroring, messages have to be copied to multiple (N or, depending
> on
> > configuration, even all) nodes in the cluster. That obviously takes more
> > time than not copying anything.
> >
> >> 2.       Adding more node did have improvement on throughput but we only
> > got 25% improvement(throughput of 3 node cluster is 25% better than 2
> node
> > cluster. 4 node cluster is 25% better than 3 node cluster too). What we
> > expected is a 45-degree line, that means when 2 nodes are used the
> > throughput is double. With 3 nodes, then triple.
> >
> > You are not providing any details about your workload but the way you
> extend
> > capacity with RabbitMQ is by adding nodes and using more queues. Queue
> > mirroring is an HA feature, which means copying more data across the
> > cluster.
> >
> > Maximum throughput and highest availability are largely at odds with each
> > other, so you need to
> >
> > 1. Use multiple queues (and the number should grow with the number of
> nodes)
> > 2. Choose what queues to mirror. Likely not all queues are equally
> important
> > to your system, so you can make some of them non-HA.
> > 3. Configure mirroring to, say, 2 nodes instead of "all".
> > 4. If you know what node is the master for a particular queue (e.g. was
> > declared on that node),
> >     make your clients connect there. It will decrease intra-cluster
> > traffic.
> > --
> > MK
> >
> >
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>

-- 
Jason McIntosh
http://mcintosh.poetshome.com/blog/
573-424-7612
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130829/5fba31a5/attachment.htm>