In our platform, we're using keepalive to manage a VIP that transfers between multiple nodes that are clustered with all my queues declared HA.<br><br>On each node, keepalive runs a script every five seconds (I call it rabbitmq_alive) which connects specifically to the local rabbitmq node on that box and determines if the that box's rabbitmq node is responsive.<br>
<br>So for example, with two nodes (mq1 and mq2), each node has:<br><br>A rabbitmq instance<br>A keepalive instance<br>A rabbitmq_alive instance which connects to the local rabbitmq instance to check its health.<br><br>So far, I've been using /api/aliveness-test/. However, I've noticed that the /api/aliveness-test queue actually lives on ONLY mq1. Now imagine the case where the mq1 node goes highly unresponsive. I'm wondering if my rabbitmq_alive check on mq2 is actually timing out if mq1 is not alive and responding in a timely manner.<br>
<br>If this is an issue, I'm wondering if there are alternative, "approved" manners of checking the health of an individual broker within a cluster?<br><br>Thanks,<br><br>Matt<br><br>