<div dir="ltr">Hi Simon,<div><br></div><div>Upgrading to 3.1.5 seems to have made things better.  So either one of the other bug fixes in 3.1.2 - 3.1.5 helped, or I was just unlucky those couple of times we were trying it with 3.1.1. ;-)</div>

<div><br></div><div>Unfortunately I don&#39;t have the logs from those previous failed attempts.</div><div><br></div><div>Thanks for the response,</div><div>Chris</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">

On Fri, Aug 30, 2013 at 12:53 PM, Simon MacMullen <span dir="ltr">&lt;<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

There was definitely a bug in autoheal fixed in 3.1.1, but I&#39;m not aware of anything since then. However it&#39;s possible some other bug that we have fixed is causing your problems with autoheal.<br>

<br>

So:<br>

<br>

1) You might as well try 3.1.5.<br>

2) Are there any crashes in the logs on the minority node?<br>

<br>

Cheers, Simon<div><div class="h5"><br>

<br>

On 30/08/2013 4:26PM, Chris wrote:<br>

</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

Hi All,<br>

<br>

As part of our testing of failovers, we yank the network cable on a<br>

machine (to simulate a switch going down).  When we plug it back in,<br>

RabbitMQ goes into the network partition mode.  At first we were using<br>

the default (&#39;ignore&#39;) option for dealing with partitions, but it caused<br>

problems.<br>

<br>

After that we put the nodes into &#39;autoheal&#39; mode.  This did not improve<br>

things.  Not only did the minority node not rejoin the partition, but it<br>

refused to restart without manually killing the process.  It also caused<br>

problems on the other nodes (in the majority).  They stopped accepting<br>

connections and I couldn&#39;t even log into the web UI.  So clearly,<br>

&#39;autoheal&#39; didn&#39;t seem to work as intended.<br>

<br>

We&#39;re using RabbitMQ 3.1.1.  Is there anything fixed since then that<br>

might help with our situation?  Our end goal is to have everything<br>

working again without intervention.  I understand that this could cause<br>

*some* data loss during the autoheal process, but this is probably OK.<br>

  We&#39;d love just to get all three nodes happy again without having to<br>

manually restart any nodes.<br>

<br>

Thanks,<br>

Chris<br>

<br>

<br></div></div>

______________________________<u></u>_________________<br>

rabbitmq-discuss mailing list<br>

<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.<u></u>rabbitmq.com</a><br>

<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/<u></u>cgi-bin/mailman/listinfo/<u></u>rabbitmq-discuss</a><br>

<br><span class="HOEnZb"><font color="#888888">

</font></span></blockquote><span class="HOEnZb"><font color="#888888">

<br>

-- <br>

Simon MacMullen<br>

RabbitMQ, Pivotal<br>

</font></span></blockquote></div><br></div>