It seems like one of the problems with round robin is that consumers may spend<br>more time on some messages than others, so you are depending on a random<br>distribution to even out the load.<br><br>To help load balancing, could the consumers be set up to, instead of round robin,<br>

simply each try to read from a common queue, and who ever gets there first gets the message.<br>This would mean that each consumer only gets a message when they become idle,<br>which seems like what would be wanted.<br><br>

On the producer side, if there were multiple queues, the producer would want to<br>write to the queue with the least amount of messages on it.<br><br>I&#39;m trying to learn AMQP too and this has been an interesting discussion to watch.<br>

<br>Thanks,<br><br>- Jim<br><br clear="all">Jim Irrer     <a href="mailto:irrer@umich.edu">irrer@umich.edu</a>       (734) 647-4409<br>University of Michigan Hospital Radiation Oncology<br>519 W. William St.             Ann Arbor, MI 48103<br>


<br><br><div class="gmail_quote">On Tue, Aug 18, 2009 at 9:18 AM, Paul Dix <span dir="ltr">&lt;<a href="mailto:paul@pauldix.net">paul@pauldix.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

All of that makes sense.<br>

<br>

Let me give some more specifics about what I&#39;m building and how I&#39;m<br>

hoping to use the messaging system. I&#39;m doing a constant internet<br>

crawl of sorts, twitter updates and everything else are in there. So<br>

when something gets pulled down the document gets inserted into a<br>

horizontally scalable key value store in the sky. I then want to send<br>

a message through the system that this key/value has been<br>

inserted/updated. This is being done by 20-100 boxes.<br>

<br>

I then want that message to be grabbed by a consumer where some<br>

processing will happen and probably some ranking, relevance and other<br>

things get written to an index somewhere (also being done by a large<br>

number of boxes).<br>

<br>

So for this specific case I&#39;m using a direct exchange with a single<br>

queue (no message persistence and don&#39;t bother keeping ordering).<br>

Hundreds of producers are posting messages to the exchange with the<br>

same routing key and hundreds of consumers are pulling off the queue.<br>

It&#39;s the firehose thing. Each message has to be processed once by any<br>

one of the hundreds of consumers.<br>

<br>

I guess I was hoping for the flow management part to be handled by<br>

Rabbit. It looks to me that if I want to scale past the ingress<br>

capabilities of one queue or exchange I have to manage that on the<br>

producer and consumer side.<br>

<br>

I can create multiple exchanges and bind to the same queue if the<br>

routing becomes the bottleneck, but then the producers need to round<br>

robin between the exchanges.<br>

<br>

I can create multiple queues bound with different routing keys (flow1,<br>

flow2) if the queue becomes the bottleneck, but then the producer<br>

needs to know to round robin to the different routing keys and the<br>

consumers need to check both queues.<br>

<br>

So in essence, when I mentioned scalability, it was a reference to<br>

being able to transparently scale the messaging system to multiple<br>

boxes. And more specifically, I want my hundreds of producers to post<br>

messages to a single exchange with a single routing key. I want my<br>

hundreds of consumers to be able to consume messages off a single<br>

queue. I want the exchange and the queue to be scalable (in the<br>

multi-box, multi-process sense) where the messaging system handles it.<br>

I want the messaging system to be scalable like the key/value store is<br>

scalable. Transparently across many boxes.<br>

<br>

There&#39;s really only one part of my system that has this requirement.<br>

There are plenty of other aspects in which I&#39;ll use messaging and not<br>

have these kinds of insane needs. As I work more with the system it&#39;s<br>

likely that I&#39;ll want to use more complex routing logic. It&#39;s possible<br>

I&#39;ll want to break updates from domains into separate message flows.<br>

<br>

Thank you very much for being so helpful. Sorry for the lengthy response.<br>

Paul<br>

<br>

On Tue, Aug 18, 2009 at 4:20 AM, Alexis<br>

<div><div></div><div class="h5">Richardson&lt;<a href="mailto:alexis.richardson@gmail.com">alexis.richardson@gmail.com</a>&gt; wrote:<br>

&gt; Paul,<br>

&gt;<br>

&gt; On Mon, Aug 17, 2009 at 8:36 PM, Paul Dix&lt;<a href="mailto:paul@pauldix.net">paul@pauldix.net</a>&gt; wrote:<br>

&gt;&gt; Yeah, that&#39;s what I&#39;m talking about. There will probably be upwards of<br>

&gt;&gt; a few hundred producers and a few hundred consumers.<br>

&gt;<br>

&gt; Cool.<br>

&gt;<br>

&gt; So one question you need to answer is: do you want all the consumers<br>

&gt; to receive the same messages?  I.e.:<br>

&gt;<br>

&gt; * are you aggregating all the producers into one &#39;firehose&#39;, and then<br>

&gt; sending the whole firehose on to all connected consumers?<br>

&gt;<br>

&gt; OR<br>

&gt;<br>

&gt; * are you planning to in some way share messages out amongst connected<br>

&gt; consumers, eg on a round robin basis<br>

&gt;<br>

&gt; See more below re flow1, flow2...<br>

&gt;<br>

&gt;<br>

&gt;&gt; The total ingress<br>

&gt;&gt; is definitely what I&#39;m most worried about right now.<br>

&gt;<br>

&gt; OK.<br>

&gt;<br>

&gt; Be aware that in high ingress rate cases you may be limited by the<br>

&gt; client egress rate, which is strongly implementation and platform<br>

&gt; dependent.  Also, see Matthias&#39; notes on testing performance, which<br>

&gt; are googleable from the rabbitmq archives, if you want to run some<br>

&gt; test cases at any point.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;&gt; Later, memory may<br>

&gt;&gt; be a concern, but hopefully the consumers are pulling so quickly that<br>

&gt;&gt; the queue never gets extremely large.<br>

&gt;<br>

&gt; Yep.<br>

&gt;<br>

&gt;<br>

&gt;&gt; Can you give me more specific details (or a pointer) to how the flow1,<br>

&gt;&gt; flow2 thing work (both producer and consumer side)?<br>

&gt;<br>

&gt; Sure.<br>

&gt;<br>

&gt; First you need to read up on what &#39;direct exchanges&#39; are and how they<br>

&gt; work in AMQP.  I recommend Jason&#39;s intro to get you started:<br>

&gt;<br>

&gt; <a href="http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/" target="_blank">http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/</a><br>

&gt;<br>

&gt; More background info can be found here: <a href="http://www.rabbitmq.com/how" target="_blank">www.rabbitmq.com/how</a><br>

&gt;<br>

&gt; In a nutshell, RabbitMQ will route any message it receives on to one<br>

&gt; or more queues.<br>

&gt;<br>

&gt; Each queue lives on a node, and nodes are members of a cluster.  You<br>

&gt; can have one or more nodes per machine - a good guide is to have one<br>

&gt; per core.  You can send messages to any node in the cluster and they<br>

&gt; will get routed to the right places (adding more nodes to a cluster is<br>

&gt; how you scale ingress and availability).<br>

&gt;<br>

&gt; The routing model is based on message routing keys: queues receive<br>

&gt; messages whose routing keys match routing patterns (&quot;bindings&quot;).  Note<br>

&gt; that multiple queues can request messages matching the same key,<br>

&gt; giving you 1-many pubsub.  This is explained in Jason&#39;s article.  I<br>

&gt; suggest you use the &#39;direct exchange&#39; routing model, in which each<br>

&gt; message has one routing key, e.g.: &quot;flow1&quot;, &quot;flow2&quot;.<br>

&gt;<br>

&gt; Take a look at the article and let us know if it all makes sense.<br>

&gt;<br>

&gt; alexis<br>

&gt;<br>

&gt;<br>

&gt;&gt; Thanks,<br>

&gt;&gt; Paul<br>

&gt;&gt;<br>

&gt;&gt; On Mon, Aug 17, 2009 at 2:32 PM, Alexis<br>

&gt;&gt; Richardson&lt;<a href="mailto:alexis.richardson@gmail.com">alexis.richardson@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt; On Mon, Aug 17, 2009 at 5:22 PM, Paul Dix&lt;<a href="mailto:paul@pauldix.net">paul@pauldix.net</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt; So what exactly does option 1 look like?<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; It sounds like it&#39;s possible to have a queue with the same id on two<br>

&gt;&gt;&gt;&gt; different nodes bound to the same exchange.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Not quite.  Same routing - two queues, two ids.  Actually now that I<br>

&gt;&gt;&gt; think about it that won&#39;t give you exactly what you need.  More below.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Will the exchange will<br>

&gt;&gt;&gt;&gt; then round robin the messages to the two different queues? If so,<br>

&gt;&gt;&gt;&gt; that&#39;s exactly what I&#39;m looking for. I don&#39;t really care about order<br>

&gt;&gt;&gt;&gt; on this queue.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; No it won&#39;t and that&#39;s why my suggestion was wrong.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Round robin does occur when you have two consumers (clients) connected<br>

&gt;&gt;&gt; to one queue.  This WILL help you by draining the queue faster, if<br>

&gt;&gt;&gt; memory is a limitation.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; If total ingress is the limitation you can increase that by splitting<br>

&gt;&gt;&gt; the flow.  Suppose you start with one queue bound once to one exchange<br>

&gt;&gt;&gt; with key &quot;flow1&quot;.  Then all messages with routing key flow1 will go to<br>

&gt;&gt;&gt; that queue.  When load is heavy, add a queue with key &quot;flow2&quot;, on a<br>

&gt;&gt;&gt; second node.  Then, alternate (if you prefer, randomly) between<br>

&gt;&gt;&gt; routing keys flow1 and flow2.  This will spread the load as you<br>

&gt;&gt;&gt; require.  And so on, for more queues.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; You can make this part of a load balancing layer on the server side,<br>

&gt;&gt;&gt; so that clients don&#39;t have to be coded too much.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Is this along the lines of what you need?  Let me know, and I can elaborate.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; alexis<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Thanks,<br>

&gt;&gt;&gt;&gt; Paul<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; On Mon, Aug 17, 2009 at 10:55 AM, Alexis<br>

&gt;&gt;&gt;&gt; Richardson&lt;<a href="mailto:alexis.richardson@gmail.com">alexis.richardson@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt; Paul<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On Mon, Aug 17, 2009 at 3:34 PM, Paul Dix&lt;<a href="mailto:paul@pauldix.net">paul@pauldix.net</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt; Sorry for the confusion. I mean scalability on a single queue. Say I<br>

&gt;&gt;&gt;&gt;&gt;&gt; want to push 20k messages per second through a single queue. If a<br>

&gt;&gt;&gt;&gt;&gt;&gt; single node can&#39;t handle that it seems I&#39;m out of luck. That is, if<br>

&gt;&gt;&gt;&gt;&gt;&gt; I&#39;m understanding how things work.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; You can in principle just add more nodes to the cluster.  More details below.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; So I guess I&#39;m not worried about total queue size, but queue<br>

&gt;&gt;&gt;&gt;&gt;&gt; throughput (although size may become an issue, I&#39;m not sure). It seems<br>

&gt;&gt;&gt;&gt;&gt;&gt; the solution is to split out across multiple queues, but I was hoping<br>

&gt;&gt;&gt;&gt;&gt;&gt; to avoid that since it will add a layer of complexity to my producers<br>

&gt;&gt;&gt;&gt;&gt;&gt; and consumers.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; 1. To maximise throughput, don&#39;t use persistence.  To make it bigger,<br>

&gt;&gt;&gt;&gt;&gt; forget about ordering.  So for example, you can easily have two<br>

&gt;&gt;&gt;&gt;&gt; queues, one per node, subscribed to the same direct exchange with the<br>

&gt;&gt;&gt;&gt;&gt; same key, and you ought to double throughput (assuming all other<br>

&gt;&gt;&gt;&gt;&gt; things being equal and fair).<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; 2. If you want to be both fast and &#39;reliable&#39; (no loss of acked<br>

&gt;&gt;&gt;&gt;&gt; messages), then add more queues and make them durable, and set<br>

&gt;&gt;&gt;&gt;&gt; messages to be persistent.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; 3. If you want to preserve ordering, label each message with an ID and<br>

&gt;&gt;&gt;&gt;&gt; dedup at the endpoints.  This does as you say, add some small noise to<br>

&gt;&gt;&gt;&gt;&gt; your producers and consumers, but the above two options 1 and 2, do<br>

&gt;&gt;&gt;&gt;&gt; not.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; I don&#39;t think I understand how using Linux-HA with clustering would<br>

&gt;&gt;&gt;&gt;&gt;&gt; lead to a splitting a single queue across multiple nodes. I&#39;m not<br>

&gt;&gt;&gt;&gt;&gt;&gt; familiar with HA, but it looked like it was a solution to provide a<br>

&gt;&gt;&gt;&gt;&gt;&gt; replicated failover.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; You are right that HA techniques, indeed any kind of queue replication<br>

&gt;&gt;&gt;&gt;&gt; or replicated failover, will not help you here.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; What you want is &#39;flow over&#39; ie. &quot;when load is high, make a new node<br>

&gt;&gt;&gt;&gt;&gt; with the same routing info&quot;.  This is certainly doable.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; alexis<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; Thanks again,<br>

&gt;&gt;&gt;&gt;&gt;&gt; Paul<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; On Mon, Aug 17, 2009 at 10:24 AM, Tony Garnock-Jones&lt;<a href="mailto:tonyg@lshift.net">tonyg@lshift.net</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Paul Dix wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Do you have a roadmap for when a scalable queue<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; will be available?<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; If by &quot;scalable&quot; you mean &quot;replicated&quot;, then that&#39;s available now, by<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; configuration along the lines I hinted at in my previous message. Adding<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; clustering into the mix can help increase capacity, on top of that (at a<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; certain cost in configuration complexity).<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; If instead you mean &quot;exceeding RAM+swap size&quot;, we&#39;re hoping to have that<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; for the 1.7 release -- which ought to be out within a month or so.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Just to give you a little more information on what I&#39;m doing, I&#39;m<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; building a live search/aggregation system. I&#39;m hoping to push updates<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; of a constant internet crawl through the messaging system so workers<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; can analyze the content and build indexes as everything comes in.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Sounds pretty cool!<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Tony<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; --<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;  [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;  []  [] <a href="http://www.lshift.net/" target="_blank">http://www.lshift.net/</a> | Email: <a href="mailto:tonyg@lshift.net">tonyg@lshift.net</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt;&gt;&gt; rabbitmq-discuss mailing list<br>

&gt;&gt;&gt;&gt;&gt;&gt; <a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>

&gt;&gt;&gt;&gt;&gt;&gt; <a href="http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;<br>

&gt;<br>

<br>

_______________________________________________<br>

rabbitmq-discuss mailing list<br>

<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>

<a href="http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>

</div></div></blockquote></div><br>