[rabbitmq-discuss] Queue data recovery after master failure
simon at rabbitmq.com
Wed Jun 18 16:42:07 BST 2014
On 18/06/14 16:21, Andrei D. wrote:
> Thanks for the quick response Simon.
> I assume there's no easy workaround?
I can't think of an easy one. If I was desperate then I would try the
following, assuming we start from a completely stopped cluster:
0) Back up Mnesia dirs on all machines, obviously.
1) Start a slave node with RABBITMQ_NODE_ONLY set. Make sure it is set,
or the slave will start the rabbit app which will clear out the slave's
persistent storage, and you restore from 0).
2) Run "rabbitmqctl forget_cluster_node --offline <dead-master>"
3) Start the mnesia app on the slave.
4) Update the rabbit_durable_queue records for queues that need
recovering from this slave, moving the slave pid for the appropriate
node() from the 'slave_pids' field to the 'pid' field.
5) Start the rabbit app on the slave.
I think that stands a decent chance of working, but obviously the
usability of such a solution is exceptionally poor. Step 4) in
particular would require some Erlang programming.
> (such as manually extracting the queue
> data from the slave and copying it to the new master before it rejoins the
> cluster; I'm not familiar with the rabbit queue data storage format so I'm
> not sure if that's feasible - probably not since you haven't mentioned it ;)
I'm not sure how well that would work, you'd have the problem that you
need not just the individual queue's index files but also the files
containing that queue's messages from the message store. I can't see
that being fun to sort out.
> ps: couldn't access the 26191 in bugzilla, I assume it's private to
Yes. But you can look out for it in future release notes, and as a
branch in hg.
More information about the rabbitmq-discuss