[rabbitmq-discuss] Repairing a a crashed cluster

Dave Seltzer dseltzer at tveyes.com
Wed Oct 10 19:23:11 BST 2012


Thank you so much for such a thorough response!

I followed your instructions and everything came back up.

Hopefully this helps someone else!

-Dave

On Wed, Oct 10, 2012 at 11:53 AM, Simon MacMullen <simon at rabbitmq.com>wrote:

> On 10/10/12 14:49, Dave Seltzer wrote:
>
>> My instinct, for the sake of uptime, is to say: "okay, forget node2,
>> lets break the cluster and bring node1 online".
>>
>> My problem is that according to the docs I need to issue a "rabbitmqctl
>> force_reset", which I can't do unless the server is running.
>>
>> I tried starting it using "rabbitmq-server -detached" but the server
>> just exited after loading plugins.
>>
>> Does anyone know the right course of action in this scenario?
>>
>
> Hi. The bad news is that the released versions of RabbitMQ do not handle
> this situation well. The good news is that the next release will do better.
>
> However, in your situation it is possible to break the cluster and bring
> node1 up. But it's a bit fiddly.
>
> First of all, you will need to start node1 with the environment variable
> RABBITMQ_NODE_ONLY set to some value. This will start the Erlang VM without
> attempting to start RabbitMQ or Mnesia. Exactly how you do this depends on
> how you have RabbitMQ installed, but on Unix you would typically add that
> to /etc/rabbitmq/rabbitmq-env.**conf. Note that our init scripts wait for
> RabbitMQ to start, so /etc/init.d/rabbitmq-server will hang, but the node
> will start.
>
> Once you have the node running, you can then invoke:
>
>   rabbitmqctl eval 'mnesia:start(),[mnesia:force_**load_table(T) || T <-
> rabbit_mnesia:table_names()],**mnesia:del_table_copy(schema, rabbit at node2
> ).'
>
> (all as one line), with node2 substituted in. This should respond with:
>
>   {atomic,ok}
>
> Then you can invoke
>
>   rabbitmqctl stop
>
> to stop node1 again. At this point it should have forgotten node2 and be
> able to start again normally.
>
> Note that we don't use "rabbitmqctl force_reset" since that would reset
> node1, and the point is to make it forget node2.
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>



-- 
Dave Seltzer <dseltzer at tveyes.com>
Chief Systems Architect
TVEyes
(203) 254-3600 x222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121010/707469f7/attachment.htm>


More information about the rabbitmq-discuss mailing list