[rabbitmq-discuss] Autoheal failure

Matt Pietrek mpietrek at skytap.com
Tue Feb 11 01:33:53 GMT 2014


Recently we started running a two node HA cluster of Rabbit 3.2.2, with
autoheal enabled.

After a network partition, I noticed that autoheal didn't appear to work,
although the logs indicate it was tried. The first time it happened, the UI
in both brokers indicated the other broker was missing from the cluster.

The second time this happened, the management plugin seemed to not function
afterwards. Most of the Web UI was unusable, i.e it wouldn't tell me which
nodes were running, what queues were declared, and so forth.


I'm wondering if what I'm seeing below is a known issue rings any bells.
Also, is their any other log output I should look at to determine
success/failure?

On the "winning" side, the logs look like this. The "ignoring" part in
particular is suspicious.

--------

=ERROR REPORT==== 3-Feb-2014::09:48:56 ===

Mnesia(rabbit at goodnessmq1): ** ERROR ** mnesia_event got
{inconsistent_database, running_partitioned_network, rabbit at goodnessmq2}


=INFO REPORT==== 3-Feb-2014::09:48:56 ===

Autoheal request received from rabbit at goodnessmq2 when in state
{winner_waiting,


[rabbit at goodnessmq2],


[rabbit at goodnessmq2]}; ignoring


=INFO REPORT==== 3-Feb-2014::09:48:56 ===

global: Name conflict terminating {rabbit_mgmt_db,<2783.10073.5>}

--------


On the "losing" side, the logs look like this:

--------

=ERROR REPORT==== 3-Feb-2014::09:48:56 ===

Mnesia(rabbit at goodnessmq2): ** ERROR ** mnesia_event got
{inconsistent_database, running_partitioned_network, rabbit at goodnessmq1}


=INFO REPORT==== 3-Feb-2014::09:48:56 ===

Autoheal request sent to rabbit at goodnessmq1


=WARNING REPORT==== 3-Feb-2014::09:48:56 ===

Federation exchange 'skytap' in vhost '/' did not connect to exchange
'skytap' in vhost '/' on amqp://something else.foo.bar.com:5672

{error,unknown_host}

=INFO REPORT==== 3-Feb-2014::09:48:56 ===

Statistics database started.


=WARNING REPORT==== 3-Feb-2014::09:48:58 ===

Federation exchange 'skytap' in vhost '/' did not connect to exchange
'skytap' in vhost '/' on amqp://somethingelse.foo.bar.com:5672

{error,unknown_host}

--------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140210/b91ebf2d/attachment.html>


More information about the rabbitmq-discuss mailing list