[rabbitmq-discuss] Towards better handling of RabbitMQ connection/channel failures

Tue Sep 17 01:13:34 BST 2013

I've been experimenting with various sorts of RabbitMQ failures that result
in connections and channels being shutdown with the goal of being able to
re-establish connections, channels, and consumers whenever a failure
occurs. In particular, I've been forcing network partitions on a
pause_minority configured cluster with a client connected to what will
become the minority node, to see how things behave, and the results are a
bit inconsistent.

For a simple test, I created 2 connections and 6 channels then partitioned
the cluster. Within a minute or so the minority node (to which my client is
connected connected) shuts itself down. What happens next varies a bit
which each test run:

Outcome 1: Immediately the shutdown listeners for my 2 connections and all
6 channels are called.

Outcome 2: Immediately 2 of my 6 channels' shutdown listeners are called.
None of the connection shutdown listeners are called. After waiting a few
minutes I heal the partition and the shutdown listeners for the 2
connections and the remaining 4 channels are immediately called.

Outcome 3: Immediate 2 of my 6 channels' shutdown listeners are called.
None of the connection shutdown listeners are called. After about 30
seconds, with the cluster still partitioned, the shutdown listeners for the
2 connections and the remaining 4 channels are immediately called.

---

I'm interested to learn more about when and why certain shutdown listeners
might or might not be invoked so I can do a better job of re-establishing
resources after a failure. Any input is appreciated.

Cheers,
Jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130916/2ac6ddab/attachment.htm>