[rabbitmq-discuss] problem with an HA pair of rabbitmq servers
Matthew Sackman
matthew at lshift.net
Tue Mar 2 12:19:29 GMT 2010
Hi Allan,
On Sun, Feb 28, 2010 at 06:41:19PM -0800, allan bailey wrote:
> We have a pair of rabbitmq servers. The 1st server periodically does a lot
> of intense I/O copying data
> out to the 2nd server. This apparently causes timeouts that then cause a
> partitioning of the cluster.
Very interesting problem - this isn't something we've come across.
I think the fix is to change the net ticktime. The default here is
apparently 60 seconds, and the below diff changes that to 500 seconds.
Obviously with larger values, the cluster will take longer to notice an
error if something bad happens. Please do let us know how you get on.
Obviously, this'll require building from source...
--- a/src/rabbit_node_monitor.erl Sun Feb 28 10:25:51 2010 +0000
+++ b/src/rabbit_node_monitor.erl Tue Mar 02 12:15:45 2010 +0000
@@ -48,6 +48,7 @@
%%--------------------------------------------------------------------
init([]) ->
+ net_kernel:set_net_ticktime(500),
ok = net_kernel:monitor_nodes(true),
{ok, no_state}.
Matthew
More information about the rabbitmq-discuss
mailing list