[rabbitmq-discuss] Can a downed node affect responsive of HTTP queries to other nodes?
simon at rabbitmq.com
Thu Oct 25 11:30:59 BST 2012
In 2.8.x /api/nodes will make RPC calls to each node, but no other paths
will. In 3.0 that's going to be removed as well.
However, /api/queues, like a lot of other paths, makes calls into Mnesia
to figure out what things exist. When a node is unresponsive, calls into
Mnesia can hang...
... but only for long enough for the VM to declare that the node is down
(again configured by net_ticktime, defaulting to about a minute). After
that, the node should be declared down and Mnesia should ignore it.
So when you say the node was "stuck", was it completely unresponsive? I
am wondering if it could be just responsive enough to prevent Erlang
from considering it down, while still being unresponsive enough to... be
On 25/10/12 01:48, Matt Pietrek wrote:
> As part of our production monitoring support, we have a script that runs
> every five seconds and checks some information about the queues. In
> particular, it uses the "/api/queues/..." URL to query info about them.
> All of our queues are declared as HA. Recently we had some problems
> where a node just got stuck for 30+ minutes (Known linux kernel bug).
> However, on the monitoring running on the healthy node, I was seeing my
> /api/queues queries timing out.
> I'm guessing that there's some set of the HTTP APIs that when invoked,
> may cause network traffic to other nodes. And if those nodes are down,
> the HTTP API is essentially useless as it eventually times out waiting
> for communication with the downed node.
> Can you always helpful RabbitMQ folks tell me if this is indeed the
> case, and if there's anything else useful to know when planning a
> monitoring strategy using the HTTP API?
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
More information about the rabbitmq-discuss