[rabbitmq-discuss] When rabbitmq is clustered with one other node we see a very slow dequeue of messages
james.eddy at us.thalesgroup.com
Wed Dec 4 18:39:15 GMT 2013
Simon MacMullen <simon at ...> writes:
> The issue is that you are getting into a situation where node B is down,
> but node A is not aware of this (probably because from a TCP level it's
> not aware that the connection has been closed). Node A therefore has to
> wait a considerable time (the net_ticktime) trying to send packets to
> node B before giving up and treating the node as down.
> If node A can tell at the TCP level that the connection to node B has
> gone down, then you won't have this wait, it'll just mark the node as
> down immediately and carry on.
> To some extent you can tweak this behaviour by reducing net_ticktime -
> but a short net_ticktime makes it plausible that a node will be
> considered down when it isn't.
> See http://www.rabbitmq.com/partitions.html for more.
> Cheers, Simon
The issue is not that RabbitMQ does not detect a node down in a timely
fashion, it does what I expect. The behavior in question is what happens
after RabbitMQ removes the node due to net_ticktime expiration. If I set
net_ticktime to 20 seconds, 20 seconds goes by, Node B is removed, and then
the slow message delivery occurs. Likewise, set it to 10 mins, after 10
mins, Node B is removed and the slowness occurs. Five to ten minutes after
Node B is removed, the server catches up. So we are seeing degraded
performance *after* Node B is removed from the cluster for up to 10 minutes.
So much so, that even with a light load of 1MSG/sec after about 5 minutes
the consumer falls behind by over 100MSGs. net_ticktime only effects when we
will see the server become degraded, but not how long.
More information about the rabbitmq-discuss