[rabbitmq-discuss] Queue data recovery after master failure

Wed Jul 16 10:48:15 BST 2014

On 15/07/2014 8:34PM, Andrei D. wrote:
> Great, I think we can make that assumption (slaves down) in the scenario I
> described.
> I'm thinking the recovery procedure would look like this:
> 1. power up all nodes without starting rabbit; say node X doesn't come up.
> 2. start rabbit on all the nodes that were not a slave for (any queue on) X
> 3. run the new and improved:)  forget_cluster_node X. -> this should promote
> some (offline) slave S as master
> 4. start rabbit on S (and the rest of the nodes) which should now be master
> and have all the messages it had when the cluster went down.

Yes, that's correct. Except that you don't need to do 2), the cluster 
can be completely down for this to happen.

> Assuming the above should work (could you kindly confirm?), what do you
> think the ETA would be for the next official release that would include the
> required forget_cluster_node fix? (the one that's already in the nightly
> build)

It will be in 3.4.0. We usually make two feature releases per yesr, in 
spring and autumn, so I would guess that would mean September-ish.

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, Pivotal