[rabbitmq-discuss] Repairing a a crashed cluster
Simon MacMullen
simon at rabbitmq.com
Wed Oct 10 16:53:19 BST 2012
On 10/10/12 14:49, Dave Seltzer wrote:
> My instinct, for the sake of uptime, is to say: "okay, forget node2,
> lets break the cluster and bring node1 online".
>
> My problem is that according to the docs I need to issue a "rabbitmqctl
> force_reset", which I can't do unless the server is running.
>
> I tried starting it using "rabbitmq-server -detached" but the server
> just exited after loading plugins.
>
> Does anyone know the right course of action in this scenario?
Hi. The bad news is that the released versions of RabbitMQ do not handle
this situation well. The good news is that the next release will do better.
However, in your situation it is possible to break the cluster and bring
node1 up. But it's a bit fiddly.
First of all, you will need to start node1 with the environment variable
RABBITMQ_NODE_ONLY set to some value. This will start the Erlang VM
without attempting to start RabbitMQ or Mnesia. Exactly how you do this
depends on how you have RabbitMQ installed, but on Unix you would
typically add that to /etc/rabbitmq/rabbitmq-env.conf. Note that our
init scripts wait for RabbitMQ to start, so /etc/init.d/rabbitmq-server
will hang, but the node will start.
Once you have the node running, you can then invoke:
rabbitmqctl eval 'mnesia:start(),[mnesia:force_load_table(T) || T <-
rabbit_mnesia:table_names()],mnesia:del_table_copy(schema, rabbit at node2).'
(all as one line), with node2 substituted in. This should respond with:
{atomic,ok}
Then you can invoke
rabbitmqctl stop
to stop node1 again. At this point it should have forgotten node2 and be
able to start again normally.
Note that we don't use "rabbitmqctl force_reset" since that would reset
node1, and the point is to make it forget node2.
Cheers, Simon
--
Simon MacMullen
RabbitMQ, VMware
More information about the rabbitmq-discuss
mailing list