[rabbitmq-discuss] Recipe for corrupting mnesia in a cluster

Matthias Radestock matthias at rabbitmq.com
Wed Jul 24 15:51:56 BST 2013


Chris,

On 23/07/13 15:39, Chris wrote:
> We are using RabbitMQ 3.1.1 / Erlang R16B on Redhat EL 6.2.  We've
> discovered a scenario that can corrupt the RabbitMQ databases pretty
> consistently, and are wondering if you might have some suggestions for
> prevention (or might want to consider a fix if possible).
>
> In short, if you are running two nodes in a cluster, and there are
> active connections, cutting the power to both nodes in short succession
> can corrupt both databases.
> [...]
>     =INFO REPORT==== 23-Jul-2013::09:44:26 ===
>     Timeout contacting cluster nodes: ['rabbit at node2'].

The issue here is that the 2nd node did not come back up within 30s of 
the first. If it had everything would have been fine.

No db corruption has occurred. This is simply a case of both nodes 
thinking they weren't the last to shut down and waiting for the other to 
come up.

> The only way I've been able to fix this is by deleting the contents of
> mnesia on both nodes and re-clustering them.

Starting rabbit on both nodes inside 30 seconds should resolve the problem.

Regards,

Matthias.


More information about the rabbitmq-discuss mailing list