[rabbitmq-discuss] Autoheal failure

Simon MacMullen simon at rabbitmq.com
Tue Feb 11 10:15:00 GMT 2014


On 11/02/14 01:33, Matt Pietrek wrote:
> Recently we started running a two node HA cluster of Rabbit 3.2.2, with
> autoheal enabled.
>
> After a network partition, I noticed that autoheal didn't appear to
> work, although the logs indicate it was tried. The first time it
> happened, the UI in both brokers indicated the other broker was missing
> from the cluster.

So the log indicates that the winning node ignored a request to start 
autohealing because it was already autohealing.

It's possible that there is a problem if a network partition occurs 
while autoheal is already happening. I'll file a bug to look into this, 
but it would help if you can show me any previous logs from this node - 
I assume that earlier (probably not much earlier) in the logs there were 
some more partition warnings and autoheal events?

> The second time this happened, the management plugin seemed to not
> function afterwards. Most of the Web UI was unusable, i.e it wouldn't
> tell me which nodes were running, what queues were declared, and so forth.

Separately there is an issue where the management database might fail to 
recover after a network partition. I just replicated that yesterday; 
note that it's not connected to autoheal.

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, Pivotal


More information about the rabbitmq-discuss mailing list