[rabbitmq-discuss] Diagnosing Network partition false positives

Patrick Long pat at munkiisoft.com
Mon Feb 10 11:36:18 GMT 2014


When I checked our testing environments this morning I saw that one of them
was reporting a Suspected Network Partition.

Both nodes are virtual machines on the same network so I don't think
"network partition" is a valid error.

I have the log files from both nodes. Nothing had happened on either
servers all weekend and then on sunday morning

*NODE1 log*

01:00:04 NODE1 logged that NODE2 was down
01:00:14 NODE1 logged
Mnesia(rabbit at NODE1): ** ERROR ** mnesia_event got {inconsistent_database,
running_partitioned_network, rabbit at NODE2}
global: Name conflict terminating {rabbit_mgmt_db,<8059.336.0>}


*NODE2 log*

01:00:56 NODE2 logged that NODE1 was down
01:01:01 NODE2 logged
Mnesia(rabbit at NODE2): ** ERROR ** mnesia_event got {inconsistent_database,
running_partitioned_network, rabbit at NODE1}

NODE1 went on to log a full error report

I tried stop_app and start_app on NODE2 and both commands errored. Then I
ran the same thing on NODE1. Both commands succeeded and the cluster was no
longer reporting a suspected network partition.

Any suggestions on how best to look into this?

Shouldn't the aliveness test flag up on one of the nodes that there is a
problem? During this time both reported {200:OK}

Thanks


-- 
Patrick Long - Munkiisoft Ltd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140210/f736c84d/attachment.html>


More information about the rabbitmq-discuss mailing list