[rabbitmq-discuss] Autoheal failure
simon at rabbitmq.com
Tue Feb 11 10:15:00 GMT 2014
On 11/02/14 01:33, Matt Pietrek wrote:
> Recently we started running a two node HA cluster of Rabbit 3.2.2, with
> autoheal enabled.
> After a network partition, I noticed that autoheal didn't appear to
> work, although the logs indicate it was tried. The first time it
> happened, the UI in both brokers indicated the other broker was missing
> from the cluster.
So the log indicates that the winning node ignored a request to start
autohealing because it was already autohealing.
It's possible that there is a problem if a network partition occurs
while autoheal is already happening. I'll file a bug to look into this,
but it would help if you can show me any previous logs from this node -
I assume that earlier (probably not much earlier) in the logs there were
some more partition warnings and autoheal events?
> The second time this happened, the management plugin seemed to not
> function afterwards. Most of the Web UI was unusable, i.e it wouldn't
> tell me which nodes were running, what queues were declared, and so forth.
Separately there is an issue where the management database might fail to
recover after a network partition. I just replicated that yesterday;
note that it's not connected to autoheal.
More information about the rabbitmq-discuss