[rabbitmq-discuss] questions about distributed queue

Mon Aug 17 15:55:16 BST 2009

Paul

On Mon, Aug 17, 2009 at 3:34 PM, Paul Dix<paul at pauldix.net> wrote:
> Sorry for the confusion. I mean scalability on a single queue. Say I
> want to push 20k messages per second through a single queue. If a
> single node can't handle that it seems I'm out of luck. That is, if
> I'm understanding how things work.

You can in principle just add more nodes to the cluster.  More details below.

> So I guess I'm not worried about total queue size, but queue
> throughput (although size may become an issue, I'm not sure). It seems
> the solution is to split out across multiple queues, but I was hoping
> to avoid that since it will add a layer of complexity to my producers
> and consumers.

1. To maximise throughput, don't use persistence.  To make it bigger,
forget about ordering.  So for example, you can easily have two
queues, one per node, subscribed to the same direct exchange with the
same key, and you ought to double throughput (assuming all other
things being equal and fair).

2. If you want to be both fast and 'reliable' (no loss of acked
messages), then add more queues and make them durable, and set
messages to be persistent.

3. If you want to preserve ordering, label each message with an ID and
dedup at the endpoints.  This does as you say, add some small noise to
your producers and consumers, but the above two options 1 and 2, do
not.

> I don't think I understand how using Linux-HA with clustering would
> lead to a splitting a single queue across multiple nodes. I'm not
> familiar with HA, but it looked like it was a solution to provide a
> replicated failover.

You are right that HA techniques, indeed any kind of queue replication
or replicated failover, will not help you here.

What you want is 'flow over' ie. "when load is high, make a new node
with the same routing info".  This is certainly doable.

alexis

> Thanks again,
> Paul
>
> On Mon, Aug 17, 2009 at 10:24 AM, Tony Garnock-Jones<tonyg at lshift.net> wrote:
>> Paul Dix wrote:
>>> Do you have a roadmap for when a scalable queue
>>> will be available?
>>
>> If by "scalable" you mean "replicated", then that's available now, by
>> configuration along the lines I hinted at in my previous message. Adding
>> clustering into the mix can help increase capacity, on top of that (at a
>> certain cost in configuration complexity).
>>
>> If instead you mean "exceeding RAM+swap size", we're hoping to have that
>> for the 1.7 release -- which ought to be out within a month or so.
>>
>>> Just to give you a little more information on what I'm doing, I'm
>>> building a live search/aggregation system. I'm hoping to push updates
>>> of a constant internet crawl through the messaging system so workers
>>> can analyze the content and build indexes as everything comes in.
>>
>> Sounds pretty cool!
>>
>> Tony
>> --
>>  [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211
>>   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060
>>  []  [] http://www.lshift.net/ | Email: tonyg at lshift.net
>>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>