[rabbitmq-discuss] routing threads on a rabbitmq node

Tue Feb 2 21:45:48 GMT 2010

Hi Matthew,

Thanks for the speedy response.  Answers below.  And yes, this is all memory
growth on the rabbitmq side, the clients are able to burn through their
catchup logs pretty quickly, then coast along as regular rates, meanwhile
the server is chewing on it's work to catch up, sometimes for *hours*.

1) How many msgs/second are being published for this issue to occur?
>From a single producer, about 900 messages/sec during these burst catchup
periods.  Normal volumes then drop down to 300-500 mps throughout the day,
which we can keep up with for the most part.  Note that there are 8-9 such
producers, distributed across 2 nodes.

2) How big are those messages?
They vary in size, but in the neighborhood of 500 bytes each.  Pretty small.

3) Can you give an example of the routing key used?
We were originally looking to do <domain>.<eventname>, but really everything
subscribes to "#.<eventname>".  It's a little wasted and I think I would
like to at some point switch back to a direct exchange and partition our
traffic by domain another way, since we don't subscribe to things across all
domains like I expected we might.  There are on the order of about 100
eventnames at this point, of varying frequencies.

4) How many queues do messages end up in, on average?
About the same number of bindings - 75.  We don't do many multiple bindings
per queue (if any).

5) Are the consumers setting qos, and are they using subscriptions or just
basic.get? What about acknowledgements?
Consumers are using the Java client, no qos settings, via subscriptions (via
QueueingConsumer.nextDelivery()).

Acknowledgements are sent after each message is retrieved via
QueueingConsumer.getChannel().basicAck(envelope.getDeliveryTag(), false);

Like I said, there is no queue backup, so I don't think it's on the
consuming side.  In fact, I can pull up a new client that just does a simple
subscription and it will instantly start showing the current place in the
routing, which could be that past hour of messages.

Does that help?  Thanks again for digging into this with me, this has been a
growing problem for us that I need to understand better to help rearchitect
our configuration.

Thanks,
Brian

On Mon, Feb 1, 2010 at 6:42 AM, Matthew Sackman <matthew at lshift.net> wrote:

> Hi Brian,
>
> On Sun, Jan 31, 2010 at 10:41:01PM -0800, Brian Sullivan wrote:
> > I am curious if anyone on the rabbitmq team can confirm/clarify what we
> are
> > seeing with respect to some throughput issues on our RMQ cluster.
> >
> > The config:
> > - 2-node RMQ cluster, running a topic-based exchange
> > - 8 publishers, running on different hosts
> > - dozens of consumers, ~75 wildcard topic bindings, mostly running on
> > different hosts (there are a couple running on the RMQ hosts for stats,
> etc)
> >
> > The issue:
> > When we publish at a higher rate than normal, there appears to be a
> > significant delay in the pipeline between when we publish the messages
> and
> > when we receive them on the consuming side.
>
> Although what you later say about memory growth suggests it's not this,
> it could be some sort of buffering or nagle algorithm which is causing
> batching of more messages, up until some buffer is full, before passing
> them onto the network. On the other hand, if you're seeing memory growth
> heavily in RabbitMQ-server itself, then that suggests it's nowt to do
> with buffering in the clients.
>
> > Since publishing is
> > asynchronous, the publisher applications send as fast as they can,
> meanwhile
> > we see an increasing delay in when we see those same messages come out on
> > the other side.  My guess (gathered from
> > http://www.rabbitmq.com/faq.html#node-per-CPU-core) is that there is
> either
> > a single routing thread per publisher (channel), or even worse a single
> > routing bottleneck per node.  Either way, this thread cannot route fast
> > enough in a topic exchange (we have about 75 bindings, using wildcards)
> and
> > there is a backup of messages to be routed.
>
> Each channel can only route one message at a time. The topic exchanges,
> with wildcards are inefficient, and are O(N) where N is the number of
> bindings. This is sub optimal - there are ways in which we are planning
> on fixing this, we've just not got around to implementing this yet.
> However, if you really just have approx 75 bindings with wildcards in
> total, I'm somewhat astonished this can be causing issues. What kind of
> rates are you publishing at?
>
> > The question:
> > Can you please elaborate on where the routing backup could be occurring,
> and
> > what steps might be best to prevent this from happening?  It appears from
> > the fact that I am waiting on the routing to happen that using flags like
> > "mandatory" on messages is not going to help me here (though I have not
> > tested this).
>
> I suspect it's in the channel processes. I'm not really sure what you
> could do to help, but could you provide some more information please?:
>
> 1) How many msgs/second are being published for this issue to occur?
> 2) How big are those messages?
> 3) Can you give an example of the routing key used?
> 4) How many queues do messages end up in, on average?
> 5) Are the consumers setting qos, and are they using subscriptions or
> just basic.get? What about acknowledgements?
>
> > One idea:
> > If it is truly the case that a single thread per node might be causing
> this
> > problem, then perhaps we can run a small rabbitmq node on each publisher
> > (joined to the cluster), with the sole purpose of doing the routing load?
> > If we publish locally, all it would need to do is keep up with it's own
> > routing load, not the combine routing load of 3 other publishers.  It
> > doesn't really prevent the problem from happening though, if I can
> produce
> > messages faster in a single thread than even a dedicated node can route.
> > Would this even help?
>
> Yeah, it may help, but without some more details, I'm not quite sure
> just yet what to suggest.
>
> Matthew
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20100202/69205aea/attachment.htm