[rabbitmq-discuss] [java-client] Parallelizing message consumption from a single queue

Steve Powell steve at rabbitmq.com
Mon Mar 26 15:11:13 BST 2012


Hi Josh,

In the absence of the experts -- you get me :-)

Questions that start 'what is the best way to...' often have no
definitive answer, I'm afraid; but here goes.

First, I'm going to have to explain what I *think* you are asking about,
before trying to answer it.

  I assume you are asking about multiple Consumers, in a Java client, all
  consuming from the same queue, allowing 'parallel' consumption -- which
  I take to mean multi-threaded -- within the same Java client. You want
  the 'best' application structure which achieves 'parallel' processing of 
  messages from the queue.

Well, there is nothing to stop you 'registering' the same Consumer
instance more than once, with different consumer tags, so it is easy to
drive multiple 'identical' Consumers. However, if you register
(Channel.basicConsume()) on the same channel, the consumers will be
called serially (this is done to preserve the ordering of messages
processed on a channel).

No doubt you've read the blurb on the Java Client API doc pages
(http://www.rabbitmq.com/api-guide.html#consuming) which says:

> Callbacks to Consumers are dispatched on a thread separate from the
> thread managed by the Connection. This means that Consumers can safely
> call blocking methods on the Connection or Channel, such asqueueDeclare,
> txCommit, basicCancel or basicPublish.
> 
> Each Channel has its own dispatch thread. For the most common use case
> of one Consumer per Channel, this means Consumers do not hold up other
> Consumers. If you have multiple Consumers per Channel be aware that a
> long-running Consumer may hold up dispatch of callbacks to other
> Consumers on that Channel.

and the section on advanced connection options mentions a thread-pool
(by default containing 5 threads) associated with the connection.

What this all means is that *on each channel* Consumer callbacks are
called serially. No overlapping there, so no chance to consume 'in
parallel', so distinct *channels* (on the same connection) are allowed
to run their consumers in parallel (up to five may run concurrently, in
the default case).

So, to 'consume' messages in parallel from a single queue you need to
have the consumers on separate channels. Now, if you define a single
Consumer instance, and get it to be invoked on multiple threads
concurrently (on separate channels) you have to be careful -- the code
in your Consumer must be thread-safe, and probably more than just that,
too. I'll assume you know what you are doing. If you create one Consumer
instance for each channel, RabbitMQ will guarantee that each channel's
Consumer is running on one thread, so the rules are simpler.

There is another option. Provided you are prepared to separate
consumption from processing (and possibly acknowledgement) you can
register a single consumer which does very little except pass the
message to another (worker) thread to do the actual processing. The
Consumer doesn't have to be sophisticated, but your dispatching
mechanism needs to be: you mustn't lose messages, and you must ensure
that you acknowledge them at some point (which may not be straight
away). Still, if you are adept at Java concurrent programming these are
all achievable. By managing your own worker threads you can achieve your
own dispatching and resource management rules, and by setting the
pre-fetch count (Qos) and managing acknowledgements you can gate the
amount of work done in parallel, and even decide which workers get which
messages.

Which of these is 'best' depends a lot on your requirements. If you need
just a little more control over the processing threads but don't want to
'roll-your-own' dispatcher mechanism there is an option for you to
supply your own ExecutorService for the RabbitMQ Connection to use, and
Java supplies some standard ExecutorService implementations which allow
you to do some of the management without considerable effort.

My general advice is that this might be a lot of work, and you should
consider investing in a sophisticated system only if you have determined
that you really need the advantages it might bring. Be aware that these
may not include faster throughput, and might upset any ordering
guarantees you may be relying upon at present.

I hope this helps.

Steve Powell  (a happy kitten)
----------some more definitions from the SPD----------
chinchilla (n.) Cooling device for the lower jaw.
socialcast (n.) Someone to whom everyone is speaking but nobody likes.
literacy (n.) A textually transmitted disease usually contracted in childhood.

On 22 Mar 2012, at 18:20, Josh Stone wrote:

> I wanted to ask the experts since it's not clear to me - what is the best way to parallelize message consumption from a single queue, using the Java client? 
> 
> Thanks,
> Josh
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



More information about the rabbitmq-discuss mailing list