[rabbitmq-discuss] crash in a two node RabbitMQ cluster

Aravindh S aravindh86 at gmail.com
Sat Dec 15 20:22:05 GMT 2012


we are running RabbitMQ v 2.8.4 in a two node cluster configuration.

we had an unplanned power outage and both the servers went down. when we 
tried to restart the rabbitmq servers, only rabbit2 node starts up and the 
node rabbit1 crashes on start.
we are running several mirrored queues between these nodes.one such queue 
"Aiken" contained more than 65K messages before the outage.Now rabbit1 wont 
start and rabbit2 starts fine but shows that there are only 109 old 
messages in the "Aiken" Queue.We are afraid if we have lost the messages 
from the rabbit1 crash.

Rabbit1 node crashes on startup on both conditions where rabbit2 was down 
and also when rabbit2 was up.

we could see the following message in the startup log,


Error description:
 {badmatch,{error,{"/var/lib/rabbitmq/mnesia/rabbit at rabbit1/queues/1NGZF3JZJR0SU2C0VE2S25JRP/clean.dot",

but could not understand what it actually means. But I am guessing rabbit1 
and rabbit2 went out of sync.

rabbitmqctl status would yield the following message.
[root at rabbit1 ~]# rabbitmqctl status
Status of node rabbit at rabbit1 ...
Error: unable to connect to node rabbit at rabbit1: nodedown


nodes in question: [rabbit at rabbit1]

hosts, their running nodes and ports:
- rabbit1: [{rabbitmqctl7856,46808}]

current node details:
- node name: rabbitmqctl7856 at rabbit1
- home dir: /var/lib/rabbitmq
- cookie hash: WYsTAr/DZ8KD7QQhMu5SSg==

logs are available here: 
rabbit at rabbit1-sasl.log --> 
rabbit at rabbit1.log --> 
startup_log --> https://docs.google.com/open?id=0B2mCr6qtz2xOOHI2bXQ5OWw4TUE

Can anyone help me with ideas to recover rabbit1 ??
Is there a way to tweak the startup of Rabbit1 so that it would start as an 
independent node ? 

The data in stake is really important. I would appreciate any help.

- Aravindh

