[rabbitmq-discuss] RabbitMQ application failover/recovery in HA cluster

Wax, Edward edward.wax at troppussoftware.com
Wed Aug 15 14:44:52 BST 2012


I am implementing a .NET WCF ActiveDuplex service with RabbitMQ under the covers for messaging purposes.   I am using the asynchronous message pattern of publish/subscribe within the ActiveDuplex service, which includes callbacks to the originating client.



There are 2 parts to the equation:  (a) our implementation of the WCF ActiveDuplex interface and (b) the RabbitMQ failover/recovery strategy.  I was hoping you could help with the RabbitMQ part of the equation.



Our previous implementation of RabbitMQ consisted of a Pacemaker active/passive HA cluster with two nodes, and a SAN disk for shared storage. We would connect to the cluster virtual IP for all RabbitMQ transactions, and the HA features of Pacemaker would manage resource failures automatically with the net effect that consumers could complete a RabbitMQ session without interruption.



We recently moved to a RabbitMQ Cluster as recommended in the Clustering Guide at http://www.rabbitmq.com/clustering.html , where multiple RabbitMQ servers are arranged in an Active/Active fashion.  In doing so we gain the benefits of a true HA environment but we have lost the ability to seamlessly recover from RabbitMQ failures, previously provided by the Pacemaker infrastructure (the single virtual IP).   I've reviewed your .NET documentation and see there are references to the various shutdown protocols.  Do you have more specific documentation that would elaborate on these and provide examples on how we can manage our client connections in this HA environment (e.g. if I receive a ModelShutdown event, what steps do I need to go through in order to "transfer" a RabbitMQ session to a new connection).  Or are there other "management" APIs that should be used instead?



I've also been looking the book, RabbitMQ In Action, and a section on application failure and recovery (6.2) sounds almost as simple as wrapping the consumer/producer code in a try/catch block.



Any thoughts would be appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120815/a7c51e04/attachment.htm>


More information about the rabbitmq-discuss mailing list