[rabbitmq-discuss] [java-client] Parallelizing message consumption from a single queue

Tue Mar 27 00:49:07 BST 2012

Thanks for the response. My response is inline below:

On Monday, March 26, 2012 7:11:13 AM UTC-7, Steve Powell wrote:
>
> Hi Josh,
>
> In the absence of the experts -- you get me :-)
>
> Questions that start 'what is the best way to...' often have no
> definitive answer, I'm afraid; but here goes.
>
> First, I'm going to have to explain what I *think* you are asking about,
> before trying to answer it.
>
>   I assume you are asking about multiple Consumers, in a Java client, all
>   consuming from the same queue, allowing 'parallel' consumption -- which
>   I take to mean multi-threaded -- within the same Java client. You want
>   the 'best' application structure which achieves 'parallel' processing of 
>   messages from the queue.
>
Yes - I have one client that is consuming from a single queue, and 
basically want to achieve the best throughput possible, whatever that means 
in terms of one or more connections, channels, consumers, and threads 
making use of these. This is the part I'm not sure about.

> Well, there is nothing to stop you 'registering' the same Consumer
> instance more than once, with different consumer tags, so it is easy to
> drive multiple 'identical' Consumers. However, if you register
> (Channel.basicConsume()) on the same channel, the consumers will be
> called serially (this is done to preserve the ordering of messages
> processed on a channel).
>
> No doubt you've read the blurb on the Java Client API doc pages
> (http://www.rabbitmq.com/api-guide.html#consuming) which says:
>
> > Callbacks to Consumers are dispatched on a thread separate from the
> > thread managed by the Connection. This means that Consumers can safely
> > call blocking methods on the Connection or Channel, such asqueueDeclare,
> > txCommit, basicCancel or basicPublish.
> > 
> > Each Channel has its own dispatch thread. For the most common use case
> > of one Consumer per Channel, this means Consumers do not hold up other
> > Consumers. If you have multiple Consumers per Channel be aware that a
> > long-running Consumer may hold up dispatch of callbacks to other
> > Consumers on that Channel.
>
> and the section on advanced connection options mentions a thread-pool
> (by default containing 5 threads) associated with the connection.
>
> What this all means is that *on each channel* Consumer callbacks are
> called serially. No overlapping there, so no chance to consume 'in
> parallel', so distinct *channels* (on the same connection) are allowed
> to run their consumers in parallel (up to five may run concurrently, in
> the default case).
>
> So, to 'consume' messages in parallel from a single queue you need to
> have the consumers on separate channels. Now, if you define a single
> Consumer instance, and get it to be invoked on multiple threads
> concurrently (on separate channels) you have to be careful -- the code
> in your Consumer must be thread-safe, and probably more than just that,
> too. I'll assume you know what you are doing. 
>

> If you create one Consumer
> instance for each channel, RabbitMQ will guarantee that each channel's
> Consumer is running on one thread, so the rules are simpler.
>
> There is another option. Provided you are prepared to separate
> consumption from processing (and possibly acknowledgement) you can
> register a single consumer which does very little except pass the
> message to another (worker) thread to do the actual processing.
>
The
> Consumer doesn't have to be sophisticated, but your dispatching
> mechanism needs to be: you mustn't lose messages, and you must ensure
> that you acknowledge them at some point (which may not be straight
> away). Still, if you are adept at Java concurrent programming these are
> all achievable. 
>
This is the approach I've taken so far. I have a single Connection, Channel 
and Consumer and a single thread that loop on consumer.nextDelivery(). As 
soon as consumer.nextDelivery() returns, the message is acknowledged and 
handed off to a separate thread pool for processing, allowing the thread to 
move on to the next delivery. 

> By managing your own worker threads you can achieve your
> own dispatching and resource management rules, and by setting the
> pre-fetch count (Qos) and managing acknowledgements you can gate the
> amount of work done in parallel, and even decide which workers get which
> messages.
>
> Which of these is 'best' depends a lot on your requirements. If you need
> just a little more control over the processing threads but don't want to
> 'roll-your-own' dispatcher mechanism there is an option for you to
> supply your own ExecutorService for the RabbitMQ Connection to use, and
> Java supplies some standard ExecutorService implementations which allow
> you to do some of the management without considerable effort.
>

My requirements are simply to maximize throughput of my message 
consumption. Rolling my own dispatcher is probably a bit much. Any 
alternative configuration recommendations (to the single Connection, 
Channel, Consumer setup), for maximizing throughput would be appreciated.

Thanks,
Josh

> My general advice is that this might be a lot of work, and you should
> consider investing in a sophisticated system only if you have determined
> that you really need the advantages it might bring. Be aware that these
> may not include faster throughput, and might upset any ordering
> guarantees you may be relying upon at present.
>
> I hope this helps.
>
> Steve Powell  (a happy kitten)
> ----------some more definitions from the SPD----------
> chinchilla (n.) Cooling device for the lower jaw.
> socialcast (n.) Someone to whom everyone is speaking but nobody likes.
> literacy (n.) A textually transmitted disease usually contracted in 
> childhood.
>
> On 22 Mar 2012, at 18:20, Josh Stone wrote:
>
> > I wanted to ask the experts since it's not clear to me - what is the 
> best way to parallelize message consumption from a single queue, using the 
> Java client? 
> > 
> > Thanks,
> > Josh
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
On Monday, March 26, 2012 7:11:13 AM UTC-7, Steve Powell wrote:
>
> Hi Josh,
>
> In the absence of the experts -- you get me :-)
>
> Questions that start 'what is the best way to...' often have no
> definitive answer, I'm afraid; but here goes.
>
> First, I'm going to have to explain what I *think* you are asking about,
> before trying to answer it.
>
>   I assume you are asking about multiple Consumers, in a Java client, all
>   consuming from the same queue, allowing 'parallel' consumption -- which
>   I take to mean multi-threaded -- within the same Java client. You want
>   the 'best' application structure which achieves 'parallel' processing of 
>   messages from the queue.
>
> Well, there is nothing to stop you 'registering' the same Consumer
> instance more than once, with different consumer tags, so it is easy to
> drive multiple 'identical' Consumers. However, if you register
> (Channel.basicConsume()) on the same channel, the consumers will be
> called serially (this is done to preserve the ordering of messages
> processed on a channel).
>
> No doubt you've read the blurb on the Java Client API doc pages
> (http://www.rabbitmq.com/api-guide.html#consuming) which says:
>
> > Callbacks to Consumers are dispatched on a thread separate from the
> > thread managed by the Connection. This means that Consumers can safely
> > call blocking methods on the Connection or Channel, such asqueueDeclare,
> > txCommit, basicCancel or basicPublish.
> > 
> > Each Channel has its own dispatch thread. For the most common use case
> > of one Consumer per Channel, this means Consumers do not hold up other
> > Consumers. If you have multiple Consumers per Channel be aware that a
> > long-running Consumer may hold up dispatch of callbacks to other
> > Consumers on that Channel.
>
> and the section on advanced connection options mentions a thread-pool
> (by default containing 5 threads) associated with the connection.
>
> What this all means is that *on each channel* Consumer callbacks are
> called serially. No overlapping there, so no chance to consume 'in
> parallel', so distinct *channels* (on the same connection) are allowed
> to run their consumers in parallel (up to five may run concurrently, in
> the default case).
>
> So, to 'consume' messages in parallel from a single queue you need to
> have the consumers on separate channels. Now, if you define a single
> Consumer instance, and get it to be invoked on multiple threads
> concurrently (on separate channels) you have to be careful -- the code
> in your Consumer must be thread-safe, and probably more than just that,
> too. I'll assume you know what you are doing. If you create one Consumer
> instance for each channel, RabbitMQ will guarantee that each channel's
> Consumer is running on one thread, so the rules are simpler.
>
> There is another option. Provided you are prepared to separate
> consumption from processing (and possibly acknowledgement) you can
> register a single consumer which does very little except pass the
> message to another (worker) thread to do the actual processing. The
> Consumer doesn't have to be sophisticated, but your dispatching
> mechanism needs to be: you mustn't lose messages, and you must ensure
> that you acknowledge them at some point (which may not be straight
> away). Still, if you are adept at Java concurrent programming these are
> all achievable. By managing your own worker threads you can achieve your
> own dispatching and resource management rules, and by setting the
> pre-fetch count (Qos) and managing acknowledgements you can gate the
> amount of work done in parallel, and even decide which workers get which
> messages.
>
> Which of these is 'best' depends a lot on your requirements. If you need
> just a little more control over the processing threads but don't want to
> 'roll-your-own' dispatcher mechanism there is an option for you to
> supply your own ExecutorService for the RabbitMQ Connection to use, and
> Java supplies some standard ExecutorService implementations which allow
> you to do some of the management without considerable effort.
>
> My general advice is that this might be a lot of work, and you should
> consider investing in a sophisticated system only if you have determined
> that you really need the advantages it might bring. Be aware that these
> may not include faster throughput, and might upset any ordering
> guarantees you may be relying upon at present.
>
> I hope this helps.
>
> Steve Powell  (a happy kitten)
> ----------some more definitions from the SPD----------
> chinchilla (n.) Cooling device for the lower jaw.
> socialcast (n.) Someone to whom everyone is speaking but nobody likes.
> literacy (n.) A textually transmitted disease usually contracted in 
> childhood.
>
> On 22 Mar 2012, at 18:20, Josh Stone wrote:
>
> > I wanted to ask the experts since it's not clear to me - what is the 
> best way to parallelize message consumption from a single queue, using the 
> Java client? 
> > 
> > Thanks,
> > Josh
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
On Monday, March 26, 2012 7:11:13 AM UTC-7, Steve Powell wrote:
>
> Hi Josh,
>
> In the absence of the experts -- you get me :-)
>
> Questions that start 'what is the best way to...' often have no
> definitive answer, I'm afraid; but here goes.
>
> First, I'm going to have to explain what I *think* you are asking about,
> before trying to answer it.
>
>   I assume you are asking about multiple Consumers, in a Java client, all
>   consuming from the same queue, allowing 'parallel' consumption -- which
>   I take to mean multi-threaded -- within the same Java client. You want
>   the 'best' application structure which achieves 'parallel' processing of 
>   messages from the queue.
>
> Well, there is nothing to stop you 'registering' the same Consumer
> instance more than once, with different consumer tags, so it is easy to
> drive multiple 'identical' Consumers. However, if you register
> (Channel.basicConsume()) on the same channel, the consumers will be
> called serially (this is done to preserve the ordering of messages
> processed on a channel).
>
> No doubt you've read the blurb on the Java Client API doc pages
> (http://www.rabbitmq.com/api-guide.html#consuming) which says:
>
> > Callbacks to Consumers are dispatched on a thread separate from the
> > thread managed by the Connection. This means that Consumers can safely
> > call blocking methods on the Connection or Channel, such asqueueDeclare,
> > txCommit, basicCancel or basicPublish.
> > 
> > Each Channel has its own dispatch thread. For the most common use case
> > of one Consumer per Channel, this means Consumers do not hold up other
> > Consumers. If you have multiple Consumers per Channel be aware that a
> > long-running Consumer may hold up dispatch of callbacks to other
> > Consumers on that Channel.
>
> and the section on advanced connection options mentions a thread-pool
> (by default containing 5 threads) associated with the connection.
>
> What this all means is that *on each channel* Consumer callbacks are
> called serially. No overlapping there, so no chance to consume 'in
> parallel', so distinct *channels* (on the same connection) are allowed
> to run their consumers in parallel (up to five may run concurrently, in
> the default case).
>
> So, to 'consume' messages in parallel from a single queue you need to
> have the consumers on separate channels. Now, if you define a single
> Consumer instance, and get it to be invoked on multiple threads
> concurrently (on separate channels) you have to be careful -- the code
> in your Consumer must be thread-safe, and probably more than just that,
> too. I'll assume you know what you are doing. If you create one Consumer
> instance for each channel, RabbitMQ will guarantee that each channel's
> Consumer is running on one thread, so the rules are simpler.
>
> There is another option. Provided you are prepared to separate
> consumption from processing (and possibly acknowledgement) you can
> register a single consumer which does very little except pass the
> message to another (worker) thread to do the actual processing. The
> Consumer doesn't have to be sophisticated, but your dispatching
> mechanism needs to be: you mustn't lose messages, and you must ensure
> that you acknowledge them at some point (which may not be straight
> away). Still, if you are adept at Java concurrent programming these are
> all achievable. By managing your own worker threads you can achieve your
> own dispatching and resource management rules, and by setting the
> pre-fetch count (Qos) and managing acknowledgements you can gate the
> amount of work done in parallel, and even decide which workers get which
> messages.
>
> Which of these is 'best' depends a lot on your requirements. If you need
> just a little more control over the processing threads but don't want to
> 'roll-your-own' dispatcher mechanism there is an option for you to
> supply your own ExecutorService for the RabbitMQ Connection to use, and
> Java supplies some standard ExecutorService implementations which allow
> you to do some of the management without considerable effort.
>
> My general advice is that this might be a lot of work, and you should
> consider investing in a sophisticated system only if you have determined
> that you really need the advantages it might bring. Be aware that these
> may not include faster throughput, and might upset any ordering
> guarantees you may be relying upon at present.
>
> I hope this helps.
>
> Steve Powell  (a happy kitten)
> ----------some more definitions from the SPD----------
> chinchilla (n.) Cooling device for the lower jaw.
> socialcast (n.) Someone to whom everyone is speaking but nobody likes.
> literacy (n.) A textually transmitted disease usually contracted in 
> childhood.
>
> On 22 Mar 2012, at 18:20, Josh Stone wrote:
>
> > I wanted to ask the experts since it's not clear to me - what is the 
> best way to parallelize message consumption from a single queue, using the 
> Java client? 
> > 
> > Thanks,
> > Josh
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
On Monday, March 26, 2012 7:11:13 AM UTC-7, Steve Powell wrote:
>
> Hi Josh,
>
> In the absence of the experts -- you get me :-)
>
> Questions that start 'what is the best way to...' often have no
> definitive answer, I'm afraid; but here goes.
>
> First, I'm going to have to explain what I *think* you are asking about,
> before trying to answer it.
>
>   I assume you are asking about multiple Consumers, in a Java client, all
>   consuming from the same queue, allowing 'parallel' consumption -- which
>   I take to mean multi-threaded -- within the same Java client. You want
>   the 'best' application structure which achieves 'parallel' processing of 
>   messages from the queue.
>
> Well, there is nothing to stop you 'registering' the same Consumer
> instance more than once, with different consumer tags, so it is easy to
> drive multiple 'identical' Consumers. However, if you register
> (Channel.basicConsume()) on the same channel, the consumers will be
> called serially (this is done to preserve the ordering of messages
> processed on a channel).
>
> No doubt you've read the blurb on the Java Client API doc pages
> (http://www.rabbitmq.com/api-guide.html#consuming) which says:
>
> > Callbacks to Consumers are dispatched on a thread separate from the
> > thread managed by the Connection. This means that Consumers can safely
> > call blocking methods on the Connection or Channel, such asqueueDeclare,
> > txCommit, basicCancel or basicPublish.
> > 
> > Each Channel has its own dispatch thread. For the most common use case
> > of one Consumer per Channel, this means Consumers do not hold up other
> > Consumers. If you have multiple Consumers per Channel be aware that a
> > long-running Consumer may hold up dispatch of callbacks to other
> > Consumers on that Channel.
>
> and the section on advanced connection options mentions a thread-pool
> (by default containing 5 threads) associated with the connection.
>
> What this all means is that *on each channel* Consumer callbacks are
> called serially. No overlapping there, so no chance to consume 'in
> parallel', so distinct *channels* (on the same connection) are allowed
> to run their consumers in parallel (up to five may run concurrently, in
> the default case).
>
> So, to 'consume' messages in parallel from a single queue you need to
> have the consumers on separate channels. Now, if you define a single
> Consumer instance, and get it to be invoked on multiple threads
> concurrently (on separate channels) you have to be careful -- the code
> in your Consumer must be thread-safe, and probably more than just that,
> too. I'll assume you know what you are doing. If you create one Consumer
> instance for each channel, RabbitMQ will guarantee that each channel's
> Consumer is running on one thread, so the rules are simpler.
>
> There is another option. Provided you are prepared to separate
> consumption from processing (and possibly acknowledgement) you can
> register a single consumer which does very little except pass the
> message to another (worker) thread to do the actual processing. The
> Consumer doesn't have to be sophisticated, but your dispatching
> mechanism needs to be: you mustn't lose messages, and you must ensure
> that you acknowledge them at some point (which may not be straight
> away). Still, if you are adept at Java concurrent programming these are
> all achievable. By managing your own worker threads you can achieve your
> own dispatching and resource management rules, and by setting the
> pre-fetch count (Qos) and managing acknowledgements you can gate the
> amount of work done in parallel, and even decide which workers get which
> messages.
>
> Which of these is 'best' depends a lot on your requirements. If you need
> just a little more control over the processing threads but don't want to
> 'roll-your-own' dispatcher mechanism there is an option for you to
> supply your own ExecutorService for the RabbitMQ Connection to use, and
> Java supplies some standard ExecutorService implementations which allow
> you to do some of the management without considerable effort.
>
> My general advice is that this might be a lot of work, and you should
> consider investing in a sophisticated system only if you have determined
> that you really need the advantages it might bring. Be aware that these
> may not include faster throughput, and might upset any ordering
> guarantees you may be relying upon at present.
>
> I hope this helps.
>
> Steve Powell  (a happy kitten)
> ----------some more definitions from the SPD----------
> chinchilla (n.) Cooling device for the lower jaw.
> socialcast (n.) Someone to whom everyone is speaking but nobody likes.
> literacy (n.) A textually transmitted disease usually contracted in 
> childhood.
>
> On 22 Mar 2012, at 18:20, Josh Stone wrote:
>
> > I wanted to ask the experts since it's not clear to me - what is the 
> best way to parallelize message consumption from a single queue, using the 
> Java client? 
> > 
> > Thanks,
> > Josh
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
On Monday, March 26, 2012 7:11:13 AM UTC-7, Steve Powell wrote:
>
> Hi Josh,
>
> In the absence of the experts -- you get me :-)
>
> Questions that start 'what is the best way to...' often have no
> definitive answer, I'm afraid; but here goes.
>
> First, I'm going to have to explain what I *think* you are asking about,
> before trying to answer it.
>
>   I assume you are asking about multiple Consumers, in a Java client, all
>   consuming from the same queue, allowing 'parallel' consumption -- which
>   I take to mean multi-threaded -- within the same Java client. You want
>   the 'best' application structure which achieves 'parallel' processing of 
>   messages from the queue.
>
> Well, there is nothing to stop you 'registering' the same Consumer
> instance more than once, with different consumer tags, so it is easy to
> drive multiple 'identical' Consumers. However, if you register
> (Channel.basicConsume()) on the same channel, the consumers will be
> called serially (this is done to preserve the ordering of messages
> processed on a channel).
>
> No doubt you've read the blurb on the Java Client API doc pages
> (http://www.rabbitmq.com/api-guide.html#consuming) which says:
>
> > Callbacks to Consumers are dispatched on a thread separate from the
> > thread managed by the Connection. This means that Consumers can safely
> > call blocking methods on the Connection or Channel, such asqueueDeclare,
> > txCommit, basicCancel or basicPublish.
> > 
> > Each Channel has its own dispatch thread. For the most common use case
> > of one Consumer per Channel, this means Consumers do not hold up other
> > Consumers. If you have multiple Consumers per Channel be aware that a
> > long-running Consumer may hold up dispatch of callbacks to other
> > Consumers on that Channel.
>
> and the section on advanced connection options mentions a thread-pool
> (by default containing 5 threads) associated with the connection.
>
> What this all means is that *on each channel* Consumer callbacks are
> called serially. No overlapping there, so no chance to consume 'in
> parallel', so distinct *channels* (on the same connection) are allowed
> to run their consumers in parallel (up to five may run concurrently, in
> the default case).
>
> So, to 'consume' messages in parallel from a single queue you need to
> have the consumers on separate channels. Now, if you define a single
> Consumer instance, and get it to be invoked on multiple threads
> concurrently (on separate channels) you have to be careful -- the code
> in your Consumer must be thread-safe, and probably more than just that,
> too. I'll assume you know what you are doing. If you create one Consumer
> instance for each channel, RabbitMQ will guarantee that each channel's
> Consumer is running on one thread, so the rules are simpler.
>
> There is another option. Provided you are prepared to separate
> consumption from processing (and possibly acknowledgement) you can
> register a single consumer which does very little except pass the
> message to another (worker) thread to do the actual processing. The
> Consumer doesn't have to be sophisticated, but your dispatching
> mechanism needs to be: you mustn't lose messages, and you must ensure
> that you acknowledge them at some point (which may not be straight
> away). Still, if you are adept at Java concurrent programming these are
> all achievable. By managing your own worker threads you can achieve your
> own dispatching and resource management rules, and by setting the
> pre-fetch count (Qos) and managing acknowledgements you can gate the
> amount of work done in parallel, and even decide which workers get which
> messages.
>
> Which of these is 'best' depends a lot on your requirements. If you need
> just a little more control over the processing threads but don't want to
> 'roll-your-own' dispatcher mechanism there is an option for you to
> supply your own ExecutorService for the RabbitMQ Connection to use, and
> Java supplies some standard ExecutorService implementations which allow
> you to do some of the management without considerable effort.
>
> My general advice is that this might be a lot of work, and you should
> consider investing in a sophisticated system only if you have determined
> that you really need the advantages it might bring. Be aware that these
> may not include faster throughput, and might upset any ordering
> guarantees you may be relying upon at present.
>
> I hope this helps.
>
> Steve Powell  (a happy kitten)
> ----------some more definitions from the SPD----------
> chinchilla (n.) Cooling device for the lower jaw.
> socialcast (n.) Someone to whom everyone is speaking but nobody likes.
> literacy (n.) A textually transmitted disease usually contracted in 
> childhood.
>
> On 22 Mar 2012, at 18:20, Josh Stone wrote:
>
> > I wanted to ask the experts since it's not clear to me - what is the 
> best way to parallelize message consumption from a single queue, using the 
> Java client? 
> > 
> > Thanks,
> > Josh
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120326/02f3fb2d/attachment.htm>