[rabbitmq-discuss] RabbitMQ 3.2.2 cluster shutting down

Andy Martin akmartin at gmail.com
Thu Mar 6 15:47:12 GMT 2014

Thanks for your quick response.

"Down" as in the Erlang VM had exited, or "down" as in the Erlang VM was
> still running but the RabbitMQ application had stopped?

That's a good question. I didn't even check. I only looked at it from
Windows services and restarted from there.  But both nodes are definitely
running 3.2.2.

I don't see anything about autoheal in the logs. The first log I sent was
from the node that went down, right at the time I lost connection to that
node. This is from the other node in the cluster. So a partition definitely
occurred, but there's nothing about autohealing. Maybe I should upgrade to
3.2.4 and see if it happens again?

=ERROR REPORT==== 6-Mar-2014::09:49:24 ===
AMQP connection <0.63.92> (running), channel 0 - error:
            "broker forced connection closure with reason 'shutdown'",none}

=INFO REPORT==== 6-Mar-2014::09:49:24 ===
Halting Erlang VM

=ERROR REPORT==== 6-Mar-2014::09:49:30 ===
Mnesia(rabbit at BOSRISDEVFAB201): ** ERROR ** mnesia_event got
{inconsistent_database, starting_partitioned_network, rabbit at MARRISDEVFAB201


On Thu, Mar 6, 2014 at 10:19 AM, Simon MacMullen <simon at rabbitmq.com> wrote:

> On 06/03/14 15:04, Andy Martin wrote:
>> ** {{badmatch,{error,not_found}},
>>      [{rabbit_amqqueue_process,i,2,[]},
>>       {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
>>       {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
>>       {rabbit_amqqueue_process,emit_stats,2,[]},
> That's the exact stack trace of a bug we fixed in 3.2.1. Are you
> completely sure that both nodes are running 3.2.2?
>  I have a two-node autoheal cluster. Yesterday I came in to find out
>> that neither server in the cluster was seeing the other, but there
>> was no message about a partition having occurred.
> You will only be notified about the partition once the servers can see
> each other again - until that happens, A thinks B is down, and B thinks A
> is down; they have no way to know it's a partition.
>  Today I came in to
>> find one of the servers down. I can't find anything in the logs that
>> would indicate why this is happening. What should I be looking for in
>> the logs?
> "Down" as in the Erlang VM had exited, or "down" as in the Erlang VM was
> still running but the RabbitMQ application had stopped?
> If the former, then it can't be anything to do with autoheal since that
> doesn't attempt to stop the node completely. You might want to look for
> "Halting Erlang VM" in the logs; that would imply the broker was shut down
> deliberately. If that's not there then please post the logs somewhere with
> an indication of when you became aware the server was down.
> If the latter, it's possible that you ran into one of the bugs we fixed in
> 3.2.4, where an attempt at autoheal can get stuck. There should still be
> messages about autoheal in the logs in that case though.
> Cheers, Simon
