[rabbitmq-discuss] RabbitMQ production setup questions around clustering

Wed Jul 21 21:43:05 BST 2010

Hi,

We are setting up a RabbitMQ cluster as our message broker for servicing our 
guaranteed delivery needs. We will have a local data store on our producers for 
guaranteeing delivery, but we don't want to resort to that except as a last 
ditch effort in cases of catastrophic failures. I would like to better 
understand how the cluster setup in RabbitMQ works.

Our setup:
2 rabbitmq nodes clustered together sitting behind a hardware load balancer. 
Upon initial release, our message volume will not be that high, but it will grow 
real fast once we offload more work to the message broker from our current 
non-messaging based infrastructure. Since the initial volume is not very high, 
we do not intend to use the load balancer for actual load balancing but to 
always send connections to a specific rabbitmq node in the cluster. If RabbitMQ 
does not respond or the port is not open, it will automatically switch to the 
second node for new connections.

Questions:
1. Are there any case studies for setting up clustering in RabbitMQ?
2. When a queue is declared (by one of our producers), is the queue always 
created on the rabbitmq node the producer client has a connection to or is the 
queue created on a randomly selected node within the cluster?
3. If the queue is durable and the messages sent to it are marked persistent, 
will these messages always be persisted to disk and be available after a restart 
of the node that has that queue, regardless of whether the node is a disk node 
or RAM node? (This line "Should you do this, and suffer           a power 
failure to the entire cluster, the entire state of           the cluster, 
including all messages, will be lost.         " in 
http://www.rabbitmq.com/clustering.html is confusing)
4. Should I configure both my nodes as disk nodes or will one disk node be 
sufficient? In other words, if only 1 disk node was there in the cluster and its 
hard drive went bust, what can I recover from the RAM node? If nothing can be 
recovered from the RAM node, is it mainly for increasing the number of 
connections without taking any hits to disk throughput?
5. Are connections redirects actually supported by the current version? The FAQ 
and Clustering documents on site are contradictory of each other. (FAQ says 
"Future releases will support live failover using, for 	      instance, a 
combination of the "known hosts" field in connection.open-ok and the 
connection.redirect message.")
6. If redirects are supported, when all connections are being sent to a specific 
rabbitmq node in the cluster by the loadbalancer, will that rabbitmq node still 
send a redirect request to the client if it's getting too taxed? If so, then is 
it possible to limit redirects to only the disk nodes within the cluster so that 
we don't lose any data? 

7. Will the connections be redirected solely based on RabbitMQ node's ability to 
serve it or is it more round-robin?
8. Is there a way to ensure that the cluster configuration is correct because 
'disc/RAM' is not reported correctly by rabbitmqctl (as per 
http://old.nabble.com/Rabbitmq-v.-1.8.1---Bug-report---Could-not-start-a-node-as-RAM-node-in-cluster-ts29211668.html)?

Thanks so much,
Dave