[rabbitmq-discuss] Mnesia corrupting after node joining cluster

Matthias Radestock matthias at rabbitmq.com
Wed Aug 1 18:46:25 BST 2012


David,

On 01/08/12 18:20, David Brown wrote:
> has there been any work on this issue (i.e. errors when doing admin work
> on a cluster)?

yes, but it will likely be a few months before these changes make it 
into a release.

> I've got a tiny, two node development cluster.  Removing
> the ram node caused the remaining (disc) node to fail on startup with a
> mnesia related error when I restarted it.  Eventually, the remaining
> (disc) node started up, but it still thinks the other node is clustered
> with it.  I've tried everything to try and get this node to realize it
> is the only node left in the cluster, nothing works.  FWIW the removed
> node does realize it is no longer part of the two node cluster.
>
> I was very careful in terms of following the exact steps in the
> 'Breaking up a cluster' section on the rabbitmq web site.

Hmm. It certainly looks like the disk node still thinks it is clustered 
with the ram node, and consequently it will fail to merge its schema 
with it when starting. That really shouldn't happen if you indeed 
followed the documented steps when breaking up the cluster, in 
particular the disk node was up and running when removing the ram node.

If you can reproduce the problem then please post a transcript of the 
commands.

> At this point, I'm a bit concerned about basing our production
> systems around rabbitmq (we're a small hedge fund) when it seems to
> fail on the simplest of tasks.

Clustering is hardly the "simplest of tasks" - sending and receiving 
messages is ;)

The clustering code hasn't changed much in 4+ years. It is stable but 
suffers from little mistakes resulting in situations that are hard to 
recover from - that's what Francesco is addressing.

I suggest you conduct some more experiments and keep transcripts of 
everything you are doing. Then, if you do encounter a weird situation it 
will be much easier to reproduce and diagnose the problem.

Also, are you sure you actually *need* clustering? It does, by 
necessity, add a significant amount of complexity and possible failure 
modes, so only use it if you have to.

Regards,

Matthias.


More information about the rabbitmq-discuss mailing list