<div dir="ltr">I can reproduce this very reliably. <div><br></div><div>How can I run an instrumented version?</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Aug 7, 2013 at 5:32 AM, Emile Joubert <span dir="ltr"><<a href="mailto:emile@rabbitmq.com" target="_blank">emile@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 06/08/13 17:53, Yamil Einar Asusta Santos wrote:<br>
<br>
> I have put the log in<br>
> here: <a href="https://gist.github.com/elbuo8/e5171ec85608b7bd7842" target="_blank">https://gist.github.com/elbuo8/e5171ec85608b7bd7842</a><br>
<br>
The error recorded on 6-Aug-2013::16:29:44 looks like it might be part<br>
of the problem. Can you tell by looking at the other logfiles whether<br>
anything noteworthy happened at that time? Is anything recorded in the<br>
sasl log?<br>
<br>
Are you able to reproduce this issue reliably? And does the last<br>
remaining node always have an log entry containing<br>
"{rabbit_node_monitor,handle_info,2}" ?<br>
<br>
Would it be possible for you to run an instrumented version of the<br>
broker to help discover the cause of the problem? It will help alot if<br>
you can reproduce the problem with instrumented code in a test environment.<br>
<br>
You may wish to configure "cluster_partition_handling" to "ignore" as a<br>
temporary workaround. This should allow the last node to keep running<br>
while the others are off-line.<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
<br>
-Emile<br>
<br>
<br>
<br>
<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><span style="background-color:rgb(255,255,255)"><span style="background-repeat:repeat repeat">Yamil</span> <span style="background-repeat:repeat repeat">Asusta</span><br>
<br></span>
</div>