[rabbitmq-discuss] Validity of /api/aliveness-test in a clustered environment

Fri Oct 26 01:18:12 BST 2012

In our platform, we're using keepalive to manage a VIP that transfers
between multiple nodes that are clustered with all my queues declared HA.

On each node, keepalive runs a script every five seconds (I call it
rabbitmq_alive) which connects specifically to the local rabbitmq node on
that box and determines if the that box's rabbitmq node is responsive.

So for example, with two nodes (mq1 and mq2), each node has:

A rabbitmq instance
A keepalive instance
A rabbitmq_alive instance which connects to the local rabbitmq instance to
check its health.

So far, I've been using /api/aliveness-test/. However, I've noticed that
the /api/aliveness-test queue actually lives on ONLY mq1.  Now imagine the
case where the mq1 node goes highly unresponsive. I'm wondering if my
rabbitmq_alive check on mq2 is actually timing out if mq1 is not alive and
responding in a timely manner.

If this is an issue, I'm wondering if there are alternative, "approved"
manners of checking the health of an individual broker within a cluster?

Thanks,

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121025/b550f1c1/attachment.htm>