<div dir="ltr">Ah-ha!  You are right!  Whenever I did my testing on this, I would start one node and wait for the status to come back &quot;OK&quot; or &quot;FAILED&quot; before starting the other.  Now if I start both at the same time, it works splendidly!  Thank you for that.<div>

<br></div><div>I have a couple of followup questions, if you don&#39;t mind:</div><div><div><ul><li>Is it possible to configure RabbitMQ to wait longer than 30 seconds before timing out?  I looked in the docs and couldn&#39;t find anything that seemed to address this.</li>

</ul><ul><li>If for some reason one of the nodes cannot be brought back online, would we then need to &quot;forget&quot; it on the other node (as described below)?</li><ul><li><font face="courier new, monospace"><span style="font-size:13px">export RABBITMQ_NODE_ONLY=true</span><br>

</font></li><li><span style="font-size:13px"><font face="courier new, monospace">rabbitmq-server &amp;<br></font></span></li><li><span style="font-size:13px"><span style="font-weight:bold"><span style="font-weight:normal"><font face="courier new, monospace">rabbitmqctl forget_cluster_node --offline rabbit@node1</font></span><br>

</span></span></li></ul></ul></div><div><br></div>Thanks again for the reply!  I feel a lot better about things now. ;-)</div><div><br></div><div>-Chris<br><div><br></div><div><br></div></div></div><div class="gmail_extra">

<br><br><div class="gmail_quote">On Wed, Jul 24, 2013 at 10:51 AM, Matthias Radestock <span dir="ltr">&lt;<a href="mailto:matthias@rabbitmq.com" target="_blank">matthias@rabbitmq.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Chris,<div class="im"><br>

<br>

On 23/07/13 15:39, Chris wrote:<br>

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

We are using RabbitMQ 3.1.1 / Erlang R16B on Redhat EL 6.2.  We&#39;ve<br>

discovered a scenario that can corrupt the RabbitMQ databases pretty<br>

consistently, and are wondering if you might have some suggestions for<br>

prevention (or might want to consider a fix if possible).<br>

<br>

In short, if you are running two nodes in a cluster, and there are<br>

active connections, cutting the power to both nodes in short succession<br>

can corrupt both databases.<br></div>

[...]<div class="im"><br>

    =INFO REPORT==== 23-Jul-2013::09:44:26 ===<br>

    Timeout contacting cluster nodes: [&#39;rabbit@node2&#39;].<br>

</div></blockquote>

<br>

The issue here is that the 2nd node did not come back up within 30s of the first. If it had everything would have been fine.<br>

<br>

No db corruption has occurred. This is simply a case of both nodes thinking they weren&#39;t the last to shut down and waiting for the other to come up.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The only way I&#39;ve been able to fix this is by deleting the contents of<br>

mnesia on both nodes and re-clustering them.<br>

</blockquote>

<br></div>

Starting rabbit on both nodes inside 30 seconds should resolve the problem.<br>

<br>

Regards,<br>

<br>

Matthias.<br>

</blockquote></div><br></div>