<div dir="ltr">Thanks for your quick response.<div><br></div><div><blockquote class="gmail_quote" style="font-family:Arial,Helvetica,sans-serif;font-size:13px;margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif">"Down" as in the Erlang VM had exited, or "down" as in the Erlang VM was still running but the RabbitMQ application had stopped?</span></blockquote>
<div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><span style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif;font-size:13px"><br></span></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<span style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif;font-size:13px">That's a good question. I didn't even check. I only looked at it from Windows services and restarted from there. But both nodes are definitely running 3.2.2.</span></div>
<div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><span style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif;font-size:13px"><br></span></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<span style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif;font-size:13px">I don't see anything about autoheal in the logs. The first log I sent was from the node that went down, right at the time I lost connection to that node. This is from the other node in the cluster. So a partition definitely occurred, but there's nothing about autohealing. Maybe I should upgrade to 3.2.4 and see if it happens again?</span><br>
</div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><span style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif;font-size:13px"><br></span></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><br></font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline">=ERROR REPORT==== 6-Mar-2014::09:49:24 ===</font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline">AMQP connection <0.63.92> (running), channel 0 - error:</font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline">{amqp_error,connection_forced,</font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline"> "broker forced connection closure with reason 'shutdown'",none}</font></div>
<div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><br></font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline">=INFO REPORT==== 6-Mar-2014::09:49:24 ===</font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline">
<font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline">Halting Erlang VM</font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><br>
</font></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline">=ERROR REPORT==== 6-Mar-2014::09:49:30 ===</font></div>
<div style="margin:0px;padding:0px;border:0px;vertical-align:baseline"><font face="arial, sans-serif" style="margin:0px;padding:0px;border:0px;vertical-align:baseline">Mnesia(rabbit@BOSRISDEVFAB201): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@MARRISDEVFAB201}</font></div>
<div style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif;font-size:13px"><br></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:arial,sans-serif;font-size:13px">
<br></div></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-size:13px;font-family:arial,sans-serif">Thanks,</div></div><div style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-size:13px;font-family:arial,sans-serif">
Andy</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Mar 6, 2014 at 10:19 AM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 06/03/14 15:04, Andy Martin wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
** {{badmatch,{error,not_found}},<br>
[{rabbit_amqqueue_process,i,2,<u></u>[]},<br>
{rabbit_amqqueue_process,'-<u></u>infos/2-lc$^0/1-0-',2,[]},<br>
{rabbit_amqqueue_process,'-<u></u>infos/2-lc$^0/1-0-',2,[]},<br>
{rabbit_amqqueue_process,emit_<u></u>stats,2,[]},<br>
</blockquote>
<br>
That's the exact stack trace of a bug we fixed in 3.2.1. Are you completely sure that both nodes are running 3.2.2?<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I have a two-node autoheal cluster. Yesterday I came in to find out<br>
that neither server in the cluster was seeing the other, but there<br>
was no message about a partition having occurred.<br>
</blockquote>
<br>
You will only be notified about the partition once the servers can see each other again - until that happens, A thinks B is down, and B thinks A is down; they have no way to know it's a partition.<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Today I came in to<br>
find one of the servers down. I can't find anything in the logs that<br>
would indicate why this is happening. What should I be looking for in<br>
the logs?<br>
</blockquote>
<br>
"Down" as in the Erlang VM had exited, or "down" as in the Erlang VM was still running but the RabbitMQ application had stopped?<br>
<br>
If the former, then it can't be anything to do with autoheal since that doesn't attempt to stop the node completely. You might want to look for "Halting Erlang VM" in the logs; that would imply the broker was shut down deliberately. If that's not there then please post the logs somewhere with an indication of when you became aware the server was down.<br>
<br>
If the latter, it's possible that you ran into one of the bugs we fixed in 3.2.4, where an attempt at autoheal can get stuck. There should still be messages about autoheal in the logs in that case though.<br>
<br>
Cheers, Simon<span class="HOEnZb"><font color="#888888"><br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, Pivotal<br>
</font></span></blockquote></div><br></div>