<div dir="ltr">When I checked our testing environments this morning I saw that one of them was reporting a Suspected Network Partition.<div><br></div><div>Both nodes are virtual machines on the same network so I don't think "network partition" is a valid error.</div>
<div><br></div><div>I have the log files from both nodes. Nothing had happened on either servers all weekend and then on sunday morning</div><div><br></div><div><b>NODE1 log</b></div><div><br></div><div>01:00:04 NODE1 logged that NODE2 was down</div>
<div>01:00:14 NODE1 logged </div><div>Mnesia(rabbit@NODE1): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@NODE2}</div><div>global: Name conflict terminating {rabbit_mgmt_db,<8059.336.0>}<br>
</div><div><br clear="all"><div><br></div><div><b>NODE2 log</b></div><div><b><br></b></div><div>01:00:56 NODE2 logged that NODE1 was down</div><div>01:01:01 NODE2 logged</div><div>Mnesia(rabbit@NODE2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@NODE1}</div>
<div><br></div><div>NODE1 went on to log a full error report</div><div><br></div><div>I tried stop_app and start_app on NODE2 and both commands errored. Then I ran the same thing on NODE1. Both commands succeeded and the cluster was no longer reporting a suspected network partition.</div>
<div><br></div><div>Any suggestions on how best to look into this?</div><div><br></div><div>Shouldn't the aliveness test flag up on one of the nodes that there is a problem? During this time both reported {200:OK}</div>
<div><br></div><div>Thanks</div><div><br></div><div><br></div>-- <br>Patrick Long - Munkiisoft Ltd
</div></div>