[rabbitmq-discuss] routing threads on a rabbitmq node

Brian Sullivan bsullivan at lindenlab.com
Fri Feb 5 06:23:26 GMT 2010


Hi Matthew,

so in the shovel model, what happens if one of the downstream
topic-based nodes crashes? Seems like all consumers on that node would
lose messages until we shut down the producing side, correct? The
volume is likely too high to buffer in memory even if the shovel
queues were able to survive until reconnect.

I've never used shovel before so maybe I am missing something fundamental here.

Thanks,
Brian


On Thursday, February 4, 2010, Matthew Sackman <matthew at lshift.net> wrote:
> Hi Brian,
>
> On Wed, Feb 03, 2010 at 12:05:23PM -0800, Brian Sullivan wrote:
>> We have ~75 bindings, same as the number of queues.  We don't do many
>> multiple bindings per queue (if any).  This has increased faster than our
>> message volumes (more consuming applications to make use of the data), so I
>> believe this is the primary reason things are harder now than they used to
>> be.
>
> If you pretty have every message going to every queue, it may be much
> simpler for you to use a fanout and then drop messages at the consumer.
> However, we're all in agreement that the use of topic exchanges here
> isn't likely to be the problem.
>
>> What I would like to figure out is how to reorient my cluster to make things
>> more stable.  Knowing that the routing time is increasing due to the number
>> of bindings, I am not convinced that my plan of adding a rabbitmq node to
>> each producer is going to make things all that much better - the routing
>> table will still be the same, and it will need to do that cross-routing
>> you're talking about avoiding.
>
> What I would recommend is to use the recently announced shovel. Have one
> node, which the publishers send to. They send to a fanout exchange. You
> then have some leaf nodes, which run the shovel. The shovel connects to
> the central node, creates a queue and binds to the fanout exchange, and
> republishes messages to a topic exchange on the leaf nodes.
>
> You then split your various other queues over the exchanges on the leaf
> nodes, thus dividing the outbound rate over the various leaf nodes.
>
> The only thing that changes is that you need to somehow load balance
> your consumers so that they know which leaf nodes to connect to. All the
> leaf nodes would receive the same messages so there's no issue about
> only being able to connect to certain nodes, but you do want to spread
> the load evenly.
>
> This would avoid using a cluster, and has the further advantage that as
> your load grows, you can add further leaf nodes to share the load
> seemlessly, without taking anything down.
>
>> Even when we have a single producer catching
>> up in our current system, the node can only route at a certain rate, and
>> this is definitely not CPU bound.  I am curious why Erlang cannot spend more
>> time in that thread, but I don't know much about it - does that seem right
>> to you?
>
> That is interesting. Did you mean "consumer" rather than "producer" at
> the top there? Assuming you did, there could be a few reasons:
>
> 1. The client itself could be the bottle neck. In the absence of a QoS
> setting, Rabbit will send messages to the consumer as fast as possible.
> These messages arriving at the consumer obviously take up some CPU
> resources to take them off the wire. Thus setting a QoS can limit the
> loading on the consumer. However, setting it too low (eg 1) can mean
> that the consumer is waiting for a little while after sending back an
> ack before the next message arrives. Some basic tuning may be useful
> here, depending on the structure of your clients (eg are they internally
> multithreaded etc).
>
> 2. TCP Buffers on client and RabbitMQ. There have been a couple of threads
> recently on this list about buffer sizes. You may wish to try increasing
> the TCP buffers of RabbitMQ so that it can load more data into the
> buffers and pass it off to the network. You might wish to measure the
> amount of network throughput you're seeing.
>
> 3. If QoS is off, and a queue has grown to a good length, then it's
> possible for acks to be "stalled" whilst the queue tries to push
> messages as fast as possible to the consumer. A build up of acks can
> hurt throughput. This has been fixed in 1.7.1. Now given that you're
> saying the RabbitMQ node doesn't seem to be CPU bound here, I don't
> think this is it, but I'd still suggest trying 1.7.1 when you can.
>
>> I am not sure what I can do to minimize cross-routing, other than to try to
>> keep our producers consolidated and keep the heaviest consumers (meaning the
>> ones with a binding to the most active topics - remember that all queues
>> bind to only one topic expression) separated on their own nodes, to remove
>> their queue management processing on the core routing function.  Ironically,
>> I was originally trying to keep the heaviest consumers on the routing nodes,
>> to minimize forwarding of messages - but if the cost magnifies with the
>> number of consumer queues, then it's likely that keeping the larger fanout
>> (but smaller throughput) of consumers on the routing nodes might be best.
>
> With the design I propose above, without the cluster, but with several
> leaf nodes, I would suggest that you try to ensure the most active
> queues are evenly distributed. across the array of leaf nodes.
>
>> The thing that concerns me is that my scalability here seems to be limited -
>> the only other thing I can think of doing is increasing my number of
>> producers to distribute the load even further and possibly do the local node
>> thing - then if our routing table keeps growing, I can manage scaling at the
>> producer level - not efficient maybe, but at least it can grow past the
>> threshold I appear to be running into.
>
> Using the shovel and spreading out load to a number of leaf nodes (and
> this hierarchy can be several layers deep if necessary) reduces the
> amount of fanout on each node, and shares out the amount of data each
> node needs to send out. This is more manual and involved, but more
> efficient than a cluster.
>
> Please let us know how you get on.
>
> Matthew
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>




More information about the rabbitmq-discuss mailing list