[rabbitmq-discuss] Architecture Question

Tue May 28 18:11:15 BST 2013

Thanks for the response Tim!

> Hi,
>
> On 28 May 2013, at 05:49, Crash Course wrote:
>
> > I'm running the broker (3.1.1) on a Dual Xeon E5 (12 cores total) with 256 MB and Windows Server 2012, on Erlang R16B (64bit).
>
> I'm guessing you don't really mean 256MB right?

Correct - I meant 256 GB. Apparently I'm thinking about a different decade!

> > One of the primary workflows that we want to enable is using a cluster  is to have several workers behind the cluster taking messages from multiple clients.
> >
> > With a single producer -> direct exchange -> single consumer on a single broker, I can get about 38000 messages / second. (I'm not sure where the performance limit is here, since I'm not queuing any messages, the consumer is the equivalent of /dev/null, and the CPU on the broker doesn't go above 12%)
>
> There is no such thing as 'not queueing any messages' since consumers *have* to consume from a queue. Presumably what you're saying is that the consumer isn't doing any work before sending an ACK. Are you using ACKs, have you considered setting a prefetch count, etc? Have you read http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-performance-measurements-part-2 to get an idea of the baselines you might expect to see?

By "not queuing any messages", I mean that the broker has no queued
messages. The consumer is set up in noAck mode, and from the
documentation I've read, setting the prefetch count has no impact in
noAck. I have read that page to get an idea of baselines, which would
indicate that 44000 is a reasonable baseline for the single
producer/queue/consumer/noack scenario.

> >
> > To test it, I ran two brokers on the same server, and tried setting up a couple of different exchange/queue layouts:
>
> Have you considered benchmarking with a single broker first, then adding in the clustering later on? Clustering is intended to provide reliability/robustness/fallback rather than improve performance - were you aware of that? Clustering comes with some overheads, e.g., if you publish to node-1 and consume from node-2, the messages have to be routed between nodes as well. A queue will *only* run on one node in the cluster, so producers and/or consumers connecting to some other node, will have their actions (i.e., publishing or consuming) forwarded to the node on which the queue is running.

I was aware that clustering was intended to provide
reliability/robustness/fallback, but was hoping that it would provide
some ease of scaling as well, instead of having to have our
application be aware of which node/broker to connect to to optimize
performance.

> > Producers publishing to a single consistent hash exchange, with each consumer connecting to a private queue bound to the exchange
> > Producers publishing to a single exchange / default exchange, with each consumer connecting to a single queue off the exchange
> This limits throughput to the speed at which a single queue can progress, in terms of both accepting incoming messages *and* passing them on to individual consumers. Why? Because for each matching binding, the message will be routed to that (bound) queue. Are you saying that you're trying to use the producer supplied routing keys in combination with the bindings to distribute the load across multiple consumers?
>
> Having multiple publishers and multiple consumers generally will increase throughput, though there will still be limits and of course for a single queue, messages will be delivered to consumers in a round-robin fashion.

In the second scenario, I'm doing producer -> direct exchange ->
single queue -> multiple consumers, where producers and consumers
connect to a random node.

>
> > In the first scenario above, if I use 24 consumers (and hence 24 queues) distributed equally across both brokers, I can barely muster 52000 messages / second (and going up to 30% CPU on each Erlang broker)
> >
>
> What happens if you use producer => direct exch => queue => consumer * 24 instead? What about on a single broker? How fast can you make a single queue go (with your anticipated load) and what does the optimum balance between # producers and # consumers for a single queue appear to be on your hardware? What happens if you now take *that* configuration and balance it across multiple queues?

On a single broker, with producer -> direct exchange -> queue ->
consumer * 24, I still have a rate of ~38000 messages per second,
which leads me to believe that a single queue will be limited to that
rate on this hardware/OS configuration.

> > In the second scenario, I stop at under 10000 messages / second.
> >
>
> How are your producers and consumers set up? Are your channels in confirm mode or using transactions? Are you using basic.get or basic.consume to consume, what about ACKs - are these auto or manual - if the latter, how long do your clients take to ACK? Have you considered changing the prefetch size, esp. if you've got multiple consumers per queue?

The producers are sending a 500 byte message via basic.publish.
As above, my consumers are in noack mode, and they are using
basic.consume followed by performing a dequeue off the consumer in a
while loop (and doing no processing of the message).

Message size appears not to make a difference, as I can set the
message size to be 10 bytes, and the message rate stays the same.

> > This leads me to believe that the former is the better design, but what kind of architecture would give a 4 fold or 10 fold increase in throughput? Would we have to hash prior to RabbitMQ? An x-consistent-hash in front of multiple x-consistent-hash exchanges?
> >
>
> The idea of the x-consistent-hash-exchange is to evenly distributed messages based on routing key. If I were you, I'd start by benchmarking against a single broker with just one and then multiple producers *and* consumers per queue, if your application logic can cope with that. Then I'd consider introducing multiple exchanges and/or queues, both with 1 producer/consumer and then multiples, still on a single broker. Then, finally, I'd introduce clustering. By all means use the consistent hash exchange if it helps, but IMO first you should try to get a feel for how an "out of the box" topology can perform far various combinations (and multiples) of producers, consumers, queues, etc.

Thanks for the pointer!

What I'm seeing as I go from
4 producers -> direct exchange -> single queue -> 6 consumers
then repeating that layout two more times (in parallel) against a
single broker is:
38000 -> 54000 -> 63000 messages per second
and
14 -> 26 -> 34% CPU

Would your recommendation then be to achieve scaling of RabbitMQ by
having the application layer be able to send to multiple exchanges and
brokers, and consumers likewise be able to listen to multiple queues
and multiple brokers

> Cheers,
> Tim