[rabbitmq-discuss] Recipe for corrupting mnesia in a cluster
Chris
stuff at moesel.net
Wed Jul 24 17:40:04 BST 2013
Ah-ha! You are right! Whenever I did my testing on this, I would start
one node and wait for the status to come back "OK" or "FAILED" before
starting the other. Now if I start both at the same time, it works
splendidly! Thank you for that.
I have a couple of followup questions, if you don't mind:
- Is it possible to configure RabbitMQ to wait longer than 30 seconds
before timing out? I looked in the docs and couldn't find anything that
seemed to address this.
- If for some reason one of the nodes cannot be brought back online,
would we then need to "forget" it on the other node (as described below)?
- export RABBITMQ_NODE_ONLY=true
- rabbitmq-server &
- rabbitmqctl forget_cluster_node --offline rabbit at node1
Thanks again for the reply! I feel a lot better about things now. ;-)
-Chris
On Wed, Jul 24, 2013 at 10:51 AM, Matthias Radestock
<matthias at rabbitmq.com>wrote:
> Chris,
>
>
> On 23/07/13 15:39, Chris wrote:
>
>> We are using RabbitMQ 3.1.1 / Erlang R16B on Redhat EL 6.2. We've
>> discovered a scenario that can corrupt the RabbitMQ databases pretty
>> consistently, and are wondering if you might have some suggestions for
>> prevention (or might want to consider a fix if possible).
>>
>> In short, if you are running two nodes in a cluster, and there are
>> active connections, cutting the power to both nodes in short succession
>> can corrupt both databases.
>> [...]
>>
>> =INFO REPORT==== 23-Jul-2013::09:44:26 ===
>> Timeout contacting cluster nodes: ['rabbit at node2'].
>>
>
> The issue here is that the 2nd node did not come back up within 30s of the
> first. If it had everything would have been fine.
>
> No db corruption has occurred. This is simply a case of both nodes
> thinking they weren't the last to shut down and waiting for the other to
> come up.
>
>
> The only way I've been able to fix this is by deleting the contents of
>> mnesia on both nodes and re-clustering them.
>>
>
> Starting rabbit on both nodes inside 30 seconds should resolve the problem.
>
> Regards,
>
> Matthias.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130724/98ce1b23/attachment.htm>
More information about the rabbitmq-discuss
mailing list