[rabbitmq-discuss] Diagnosing Network partition false positives
Patrick Long
pat at munkiisoft.com
Mon Feb 10 11:36:18 GMT 2014
When I checked our testing environments this morning I saw that one of them
was reporting a Suspected Network Partition.
Both nodes are virtual machines on the same network so I don't think
"network partition" is a valid error.
I have the log files from both nodes. Nothing had happened on either
servers all weekend and then on sunday morning
*NODE1 log*
01:00:04 NODE1 logged that NODE2 was down
01:00:14 NODE1 logged
Mnesia(rabbit at NODE1): ** ERROR ** mnesia_event got {inconsistent_database,
running_partitioned_network, rabbit at NODE2}
global: Name conflict terminating {rabbit_mgmt_db,<8059.336.0>}
*NODE2 log*
01:00:56 NODE2 logged that NODE1 was down
01:01:01 NODE2 logged
Mnesia(rabbit at NODE2): ** ERROR ** mnesia_event got {inconsistent_database,
running_partitioned_network, rabbit at NODE1}
NODE1 went on to log a full error report
I tried stop_app and start_app on NODE2 and both commands errored. Then I
ran the same thing on NODE1. Both commands succeeded and the cluster was no
longer reporting a suspected network partition.
Any suggestions on how best to look into this?
Shouldn't the aliveness test flag up on one of the nodes that there is a
problem? During this time both reported {200:OK}
Thanks
--
Patrick Long - Munkiisoft Ltd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140210/f736c84d/attachment.html>
More information about the rabbitmq-discuss
mailing list