[rabbitmq-discuss] Long pauses when closing many channels simultaneously
josh
martin.rogan.inc at gmail.com
Thu Sep 26 11:50:31 BST 2013
I have 1 RabbitMQ Connection, 10K publishing Channels, 10K consuming
Channels and 100 Threads. All channels are in rapid publish/consume "loops"
using exclusive queues and external synchronization; no possibility of
publish, consume or close happening simultaneously on a channel. Closing
channels simultaneously on 100 threads incurs a delay of 2.5 seconds for
each of the first 100 channels. All subsequent closures are processed in
less than 0.5 seconds. If I bump it up to 20K+20K channels the initial 100
each take 5 seconds to close. With 30K+30K each takes 10 seconds, where
we're pretty much at the channel-max for a connection. Similarly if I
reduce the number of threads to 50 then it's the first 50 channels that
close slowly.
Let me restate that... *With 30K+30K channels the first 100 each take 10
seconds to close using 100 simultaneous threads. The remaining 59,900 each
take less than 0.5 seconds.* My feeling is that there's some funky
connection-wide synchronization/continuation going on here. Hit the
connection up with 100 channel-close requests on 100 threads simultaneously
and it baulks. Whatever causes that initial spasm doesn't seem to affect
subsequent close operations and everything swims along nicely.
I've tried ramping up the number of connections to relieve the pressure.
This certainly works with predictable results. With 30K+30K connections
spread evenly over 2 connections the initial 100 channel-close delays are
halved from 10 seconds to 5 seconds. Use 10 connections and the delay is
imperceptible when compared to the subsequent 59,900 channel closures. Jump
to 50K+50K channels (we can do this with 10 connections but not 1
connection due to channel-max) and the delays start to creep back in again.
My concerns with this approach are that 1) multiple connections are
discouraged in the documentation due to i/o resource overhead and that 2)
it's not clear for my application how to sensibly predict the optimum
number of channels per connection. If there is a soft limit to the number
of channels per connection why is it not documented or made available in
the api? If I have to manage a number of connections and allocate channels
across those connections in a robust manner I feel like I'm doing the work
of a client library!
I've tried my hand at modifying the client library by not waiting for
channel-close acknowledgements from the RabbitMQ server. This worked like a
charm. Channels were closed instantly with no delay in the client and
confirmed as closed on the server. Eight hours later though and I was out
of heap space as the channel resources internal to the client library were
not being released. I haven't managed to isolate the source of the delay
either... is it in the client library or the server itself? To progress
further I'd need to trace the wire protocol. I think I'm off track with
this approach!
*Questions:*
Before making application changes I'd like to know if this is a known issue
with the Java client? Are there better workarounds than multiple
connections and application-level channel management? In practise my actual
application uses around 20K channels per process, which I don't feel is
excessive, and message throughput is actually pretty light as I'm
leveraging RabbitMQ more for it's routing capabilities. if you think the
number of channels is a problem in itself then please say so! I could
refactor to use less channels but then I'd be sharing channels and would
either have to synchronize their usage or ignore documentation guidelines.
The error handling paradigm makes this cumbersome though; any channel error
results in it's termination so it's difficult to isolate errors, prevent
them from permeating across unrelated publishers/consumers and recover in a
robust manner.
*Detail:*
RabbitMQ server 3.1.5, Java amqp-client 3.1.5. Also tried 3.1.4, 3.1.3 in
the same test harness and seen what I assume to be exactly the same
behaviour in 3.0 and prior in production application.
My test harness is standalone but a bit obtuse to post here. I create 10K
test clients each of which creates an exclusive queue and runs a continuous
publish-consume-publish-consume loop. The publisher and consumer each have
their own channels. I'm using channel.basicConsume with DefaultConsumers
and a default consumer executor service. Also used my own Java executor
service, which appears to be the same as the default implementation, with
various numbers of threads to no discernible effect.
Messages consumed in handleDelivery spawn a new publish task via a
java.concurrent fixed thread pool, so the consumer thread is not tied up or
used in any channel operations whatsoever. The channel-close is
synchronized so that it cannot happen at the same time as a publish.
However there's nothing stopping a consume happening while close is being
processed - the consumer is out of my control.
Thanks for any pointers!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130926/21ee4ecc/attachment.htm>
More information about the rabbitmq-discuss
mailing list