[rabbitmq-discuss] RabbitMQ 3.2.2 cluster shutting down
Simon MacMullen
simon at rabbitmq.com
Thu Mar 6 15:19:30 GMT 2014
On 06/03/14 15:04, Andy Martin wrote:
> ** {{badmatch,{error,not_found}},
> [{rabbit_amqqueue_process,i,2,[]},
> {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
> {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
> {rabbit_amqqueue_process,emit_stats,2,[]},
That's the exact stack trace of a bug we fixed in 3.2.1. Are you
completely sure that both nodes are running 3.2.2?
> I have a two-node autoheal cluster. Yesterday I came in to find out
> that neither server in the cluster was seeing the other, but there
> was no message about a partition having occurred.
You will only be notified about the partition once the servers can see
each other again - until that happens, A thinks B is down, and B thinks
A is down; they have no way to know it's a partition.
> Today I came in to
> find one of the servers down. I can't find anything in the logs that
> would indicate why this is happening. What should I be looking for in
> the logs?
"Down" as in the Erlang VM had exited, or "down" as in the Erlang VM was
still running but the RabbitMQ application had stopped?
If the former, then it can't be anything to do with autoheal since that
doesn't attempt to stop the node completely. You might want to look for
"Halting Erlang VM" in the logs; that would imply the broker was shut
down deliberately. If that's not there then please post the logs
somewhere with an indication of when you became aware the server was down.
If the latter, it's possible that you ran into one of the bugs we fixed
in 3.2.4, where an attempt at autoheal can get stuck. There should still
be messages about autoheal in the logs in that case though.
Cheers, Simon
--
Simon MacMullen
RabbitMQ, Pivotal
More information about the rabbitmq-discuss
mailing list