[rabbitmq-discuss] distributed cluster questions about performance

Wed Mar 13 03:49:31 GMT 2013

I'm using 6 servers to make a cluster and they are all disk nodes. I use 
rabbitmq for collecting log file for our website. Now at the peak hour, the 
publish rate is about 30k message per second. There are 2 main 
consumers(hdfs and elasticsearch) and each one need to handle all message, 
so the delivery rate hit about 60k per second.

In my scenario, a single server can hold 10k delivery rate and I use 6 node 
to load balance the pressure. My solution is that I created 2 queues on 
each node. Each message is with a random routing-key(something like 
message.0, message.1, etc) to distribute the pressure to every node. 

What confused me is:

   1. All message send to one node. Should I use a HA Proxy to load balance 
   this publish pressure?
   2. Is there any performance difference between Durable Queues and 
   Transient Queues?
   3. Is there any performance difference between Memory Node and Disk 
   Node? What I know is the difference between memory node and disk node is 
   only about the meta data such as queue configuration.
   4. How can I import the performance in publish and delivery codes? I've 
   researched and I know several methods:
      - disable the confirm mechanism(in publish codes?)
      - enable HiPE(I've done that and it helped a lot)
   5. For example, input is 1w mps(message per second), there are two 
   consumers to consume all message. Then the output is 2w mps. If my server 
   can handle 1w mps, I need two server to handle the 2w-mps-pressure. Now a 
   new consumer need to consume all message, too. As a result, output hits 3w 
   mps, so I need another one more server. For a conclusion, one more consumer 
   for all message, one more server?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130312/7e769327/attachment.htm>