[rabbitmq-discuss] questions about distributed queue
Paul Dix
paul at pauldix.net
Mon Aug 17 20:36:03 BST 2009
Yeah, that's what I'm talking about. There will probably be upwards of
a few hundred producers and a few hundred consumers. The total ingress
is definitely what I'm most worried about right now. Later, memory may
be a concern, but hopefully the consumers are pulling so quickly that
the queue never gets extremely large.
Can you give me more specific details (or a pointer) to how the flow1,
flow2 thing work (both producer and consumer side)?
Thanks,
Paul
On Mon, Aug 17, 2009 at 2:32 PM, Alexis
Richardson<alexis.richardson at gmail.com> wrote:
> On Mon, Aug 17, 2009 at 5:22 PM, Paul Dix<paul at pauldix.net> wrote:
>> So what exactly does option 1 look like?
>>
>> It sounds like it's possible to have a queue with the same id on two
>> different nodes bound to the same exchange.
>
> Not quite. Same routing - two queues, two ids. Actually now that I
> think about it that won't give you exactly what you need. More below.
>
>
>> Will the exchange will
>> then round robin the messages to the two different queues? If so,
>> that's exactly what I'm looking for. I don't really care about order
>> on this queue.
>
> No it won't and that's why my suggestion was wrong.
>
> Round robin does occur when you have two consumers (clients) connected
> to one queue. This WILL help you by draining the queue faster, if
> memory is a limitation.
>
> If total ingress is the limitation you can increase that by splitting
> the flow. Suppose you start with one queue bound once to one exchange
> with key "flow1". Then all messages with routing key flow1 will go to
> that queue. When load is heavy, add a queue with key "flow2", on a
> second node. Then, alternate (if you prefer, randomly) between
> routing keys flow1 and flow2. This will spread the load as you
> require. And so on, for more queues.
>
> You can make this part of a load balancing layer on the server side,
> so that clients don't have to be coded too much.
>
> Is this along the lines of what you need? Let me know, and I can elaborate.
>
> alexis
>
>
>
>
>> Thanks,
>> Paul
>>
>> On Mon, Aug 17, 2009 at 10:55 AM, Alexis
>> Richardson<alexis.richardson at gmail.com> wrote:
>>> Paul
>>>
>>> On Mon, Aug 17, 2009 at 3:34 PM, Paul Dix<paul at pauldix.net> wrote:
>>>> Sorry for the confusion. I mean scalability on a single queue. Say I
>>>> want to push 20k messages per second through a single queue. If a
>>>> single node can't handle that it seems I'm out of luck. That is, if
>>>> I'm understanding how things work.
>>>
>>> You can in principle just add more nodes to the cluster. More details below.
>>>
>>>
>>>
>>>
>>>> So I guess I'm not worried about total queue size, but queue
>>>> throughput (although size may become an issue, I'm not sure). It seems
>>>> the solution is to split out across multiple queues, but I was hoping
>>>> to avoid that since it will add a layer of complexity to my producers
>>>> and consumers.
>>>
>>> 1. To maximise throughput, don't use persistence. To make it bigger,
>>> forget about ordering. So for example, you can easily have two
>>> queues, one per node, subscribed to the same direct exchange with the
>>> same key, and you ought to double throughput (assuming all other
>>> things being equal and fair).
>>>
>>> 2. If you want to be both fast and 'reliable' (no loss of acked
>>> messages), then add more queues and make them durable, and set
>>> messages to be persistent.
>>>
>>> 3. If you want to preserve ordering, label each message with an ID and
>>> dedup at the endpoints. This does as you say, add some small noise to
>>> your producers and consumers, but the above two options 1 and 2, do
>>> not.
>>>
>>>
>>>> I don't think I understand how using Linux-HA with clustering would
>>>> lead to a splitting a single queue across multiple nodes. I'm not
>>>> familiar with HA, but it looked like it was a solution to provide a
>>>> replicated failover.
>>>
>>> You are right that HA techniques, indeed any kind of queue replication
>>> or replicated failover, will not help you here.
>>>
>>> What you want is 'flow over' ie. "when load is high, make a new node
>>> with the same routing info". This is certainly doable.
>>>
>>> alexis
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Thanks again,
>>>> Paul
>>>>
>>>> On Mon, Aug 17, 2009 at 10:24 AM, Tony Garnock-Jones<tonyg at lshift.net> wrote:
>>>>> Paul Dix wrote:
>>>>>> Do you have a roadmap for when a scalable queue
>>>>>> will be available?
>>>>>
>>>>> If by "scalable" you mean "replicated", then that's available now, by
>>>>> configuration along the lines I hinted at in my previous message. Adding
>>>>> clustering into the mix can help increase capacity, on top of that (at a
>>>>> certain cost in configuration complexity).
>>>>>
>>>>> If instead you mean "exceeding RAM+swap size", we're hoping to have that
>>>>> for the 1.7 release -- which ought to be out within a month or so.
>>>>>
>>>>>> Just to give you a little more information on what I'm doing, I'm
>>>>>> building a live search/aggregation system. I'm hoping to push updates
>>>>>> of a constant internet crawl through the messaging system so workers
>>>>>> can analyze the content and build indexes as everything comes in.
>>>>>
>>>>> Sounds pretty cool!
>>>>>
>>>>> Tony
>>>>> --
>>>>> [][][] Tony Garnock-Jones | Mob: +44 (0)7905 974 211
>>>>> [][] LShift Ltd | Tel: +44 (0)20 7729 7060
>>>>> [] [] http://www.lshift.net/ | Email: tonyg at lshift.net
>>>>>
>>>>
>>>> _______________________________________________
>>>> rabbitmq-discuss mailing list
>>>> rabbitmq-discuss at lists.rabbitmq.com
>>>> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>
>>>
>>
>
More information about the rabbitmq-discuss
mailing list