[rabbitmq-discuss] Startup problems after network issues

Wed Sep 26 17:04:37 BST 2012

Hi, thanks for the reply. There shouldn't be any reasons for interruptions between the nodes, so hopefully that won't turn into a problem.

I'm still having some trouble bringing the node up however, even after resetting the database (I assume you mean the /var/lib/rabbitmq/mnesia directory).
Even after moving the directory I keep getting the exact same message as you can see at the end of the log file I attached on the last email (from the rabbitmq_management plugin).
I've also tried commenting out cluster_nodes in rabbitmq.config and followed the clustering guide (http://www.rabbitmq.com/clustering.html) but still no luck.
(Note that I haven't done any changes to rmq-002, which is still running)

Jon

________________________________________
From: Emile Joubert [emile at rabbitmq.com]
Sent: Wednesday, September 26, 2012 16:33
To: Discussions about RabbitMQ
Cc: Jon Bergli Heier
Subject: Re: [rabbitmq-discuss] Startup problems after network issues

Hi Jon,

On 24/09/12 13:17, Jon Bergli Heier wrote:
> During the weekend we had some minor network issues, after which one
> of the cluster nodes crashed and won't start up again.

The error in the logfile {mnesia_locker, ... ,granted} appears as the
result of a netsplit (nodes being unable to communicate). This is not a
situation the broker makes any attempt to deal with. If your cluster
will be subject to network interruptions on a regular basis then
consider some of the other distribution strategies:
http://www.rabbitmq.com/distributed.html

The simplest way of starting the failed node is to reset its database
and rejoin the cluster. The database can be reset by moving the database
directory out of the way. Mirrored queues should not be affected by the
loss of one node (as long as the remaining nodes were synchronised), but
any non-mirrored queues (together with their contents) that were defined
on the failing node will be lost.

-Emile