[rabbitmq-discuss] Proposal: Change consumers Round-Robin behaviour

Thu May 2 12:27:37 BST 2013

Hi,

Assume we have two hosts, red and green, each running 10 consumers on the
same queue. Inside `rabbit_amqqueue_process` these consumers will be placed
in a standard Erlang `queue` module and wait for jobs to arrive. If all the
20 consumers are busy all the time, there is no problems here at all. But
if there are more consumers than messages in the queue the consumers will
sit idle and wait.

There are a couple of reasons to have idle workers, the most important
being you want to handle sudden message spikes for instance. Now, we would
like the host consumers to be interleaved in the queue:

RGRGRGRG,...

But in practice, since it is a queue, this may not be the case. We could
have something along the lines of

RRRRRRRGGGGGGG,...

which means that if requests arrive slowly, they will only be processed by
the Red host for a while and then only be processed by the Green host for a
while. If the hosts are different in nature, it is very likely that over
time, there will be clusters formed in the queue like this.

A way to alleviate this is to check for the following conditions whenever
we have "run" the queue:

1. There are no more messages (queue is empty)
2. There are active consumers waiting (active_consumers is not empty)

When this happens, we pick a random consumer in the queue and move him to
the front. Over time, this "shuffles" the queue into a random order. It is
also not going to cost anything on the critical path since we only do it
when we have an empty queue and excess workers. And we are going to do very
little work unless the queue has a behaviour where it empties often in
which case you get full random distribution on the consumers with this
scheme.

The background for the proposal is that Round-robin distribution of
messages often tend to bad behaviour over time. By adding a bit of
randomness to the process, we automatically alleviate a number of
determinism-problems and get better distribution of messages over
consumers. One could also imagine different distribution schemes, but those
will be more expensive in practice compared to this proposal, which should
only have a cost when the queue is not under heavy load.

* Did I miss anything?
* Is this a good or bad idea? And why?
* Do we break any rules w.r.t. AMQP by implementing this?
* Is priority on the queue going to be harder to implement? (I don't think
so, but...)

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130502/3c63c29e/attachment.htm>