We have a pair of rabbitmq servers. The 1st server periodically does a lot of intense I/O copying data<br>out to the 2nd server. This apparently causes timeouts that then cause a partitioning of the cluster.<br><br>My main question is this: can we set the timeout value higher? and if so, how? I found nothing in the<br>
manual pages about a timeout setting between cluster nodes.<br><br>When we see this problem, the quickest solution is to shutdown the 2nd server, nuke the mnesia dir and <br>rebuild it w/ a cluster file pointing to the 1st server. Then start up again and all is well. (it even gets the users and vhosts from the 1st.)<br>
<br><br>Below are the errors we see when this problem happens.<br><br><br>1st server sees this error:<br>===<br>[=ERROR REPORT==== 28-Feb-2010::16:28:52 ===<br>Mnesia(rabbit@rq101): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rq102}]<br>
<br>===<br><br><br>2nd server sees these errors:<br>===<br>[=ERROR REPORT==== 28-Feb-2010::16:27:08 ===<br>** Node rabbit@rq101 not responding **<br>** Removing (timedout) connection **]<br>[=ERROR REPORT==== 28-Feb-2010::16:28:38 ===<br>
** Node rabbit@rq101 not responding **<br>** Removing (timedout) connection **]<br>=INFO REPORT==== 28-Feb-2010::16:28:52 ===<br>node rabbit@rq101 up<br>=WARNING REPORT==== 28-Feb-2010::16:28:52 ===<br>The global_name_server locker process received an unexpected message:<br>
{{#Ref<0.0.0.186122>,rabbit@rq101},true}<br>=WARNING REPORT==== 28-Feb-2010::16:28:52 ===<br>The global_name_server locker process received an unexpected message:<br>{{#Ref<0.0.0.186227>,rabbit@rq101},true}<br>
[=ERROR REPORT==== 28-Feb-2010::16:28:52 ===<br>Mnesia(rabbit@rq102): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rq101}]<br><br>===<br><br><br>thanks,<br>Allan<br><br>