[rabbitmq-discuss] can't restart rabbit cluster after power outage

Ben Hsu ben.hsu at criticalmedia.com
Fri Jun 20 19:00:53 BST 2014


Hello Simon,


On Fri, Jun 20, 2014 at 12:45 PM, Simon MacMullen <simon at rabbitmq.com>
wrote:

> On 20/06/14 16:55, Ben Hsu wrote:
>
>>
>>  So I tried restarting the ram backed node, and it started fine. But when
>> I tried to start the disk backed node, it gave me a different error,
>> basically saying “inconsistent_cluster, thinks its clustered with node3,
>> but node3 disagrees”.
>>
>
> The ram node should have refused to start alone. We had a bug where it
> would start in older versions and then get confused - which RabbitMQ
> version are you running?
>
>
That's interesting. It looks like an older version of rabbitmq

      {rabbit,"RabbitMQ","3.0.0"},


>  What I would love to do is take one of the disk nodes, start it as the
>> master, and tell the other nodes to join its cluster. Is that possible?
>> Right now I cannot even run “rabbitmqctl cluster_status” because the
>> node won’t start
>>
>
> What you want is "rabbitmqctl forget_cluster_node --offline". This will:
>
> 1) Allow you to tell node1 or node2 that node3 has left the cluster
> (you'll need to re-add it later).
>
> 2) Reset node1 or node2's idea of which nodes were the last to shut down,
> allowing the cluster to start again.
>
> "rabbitmqctl forget_cluster_node --offline" is currently a bit of a pain
> to use, since you have to start an Erlang node without booting RabbitMQ.
>
> You can do this by adding "NODE_ONLY=true" to /etc/rabbitmq/rabbitmq-env.conf
> on node1 or node2. Attempting to start RabbitMQ in whatever's the normal
> way for you will get an Erlang node started without RabbitMQ (i.e. as if
> you'd successfully booted the server then invoked "rabbitmqctl stop_app").
>
> You can now invoke "rabbitmqctl forget_cluster_node --offline node3"
>
> Once you've done that, you can stop your node, remove NODE_ONLY=true and
> it should start correctly. The other disc node should then be able to start
> up and join the cluster without further fiddling.
>
>
Now thats interesting! Having an erlang node up without having to start up
rabbit sounds very useful. I tried doing forgetting the cluster node and
got this:

sudo rabbitmqctl forget_cluster_node --offline rabbit at node3

Error: not_last_node_to_go_down: The node you are trying to remove from was
not the last to go down (excluding the node you are removing). Please use
the the last node to go down to remove nodes when the cluster is offline.

I got the same error when I tried the same command with each of the other
nodes. So it looks like rabbit has no idea which node was the last to die.
Is it possible to ask rabbit for this information, possibly by groveling
around mnesia with "rabbitmqctl eval"?

I'm also okay with a drastic solution like "blow away the mnesia database
and recover the data in some other way". Part of the reason I'm asking all
these questions is because I want to learn more about rabbitmq
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140620/60a4523f/attachment.html>


More information about the rabbitmq-discuss mailing list