<div dir="ltr">Thanks for the response Tim.<div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 18, 2013 at 2:59 AM, Tim Watson <span dir="ltr"><<a href="mailto:tim@rabbitmq.com" target="_blank">tim@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Jonathan,<div><div><div><div>On 17 Sep 2013, at 19:34, Jonathan Halterman wrote:</div>
<blockquote type="cite"><div dir="ltr"><div>Why are the shutdown listeners for only some of my channels/connections called when a rabbit server shuts down?</div>
<div><br></div></div></blockquote><div><br></div></div><div>All connection/channel shutdown listeners are triggered when the client detects the shutdown. When the shutdown originates at the server, this activity is�mediated�on the client side in one of two ways, either (a) the client received a `connection.close' AMQP method from the broker, or (b) the OS networking layer signals to the JVM that the socket has closed, at which point the listening/reading thread handles the relevant exception, tears down any associated local resources and fires the shutdown listeners. In the latter case, there can be a significant time delay before the operating system "notices" that the peer socket has closed/disappeared. Having said all that....</div>
<div><br><blockquote type="cite"><div dir="ltr"><div>On Mon, Sep 16, 2013 at 5:13 PM, Jonathan Halterman <span dir="ltr"><<a href="mailto:jhalterman@gmail.com" target="_blank">jhalterman@gmail.com</a>></span> wrote:</div>
</div><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">
I've been experimenting with various sorts of RabbitMQ failures that result in connections and channels being shutdown with the goal of being able to re-establish connections, channels, and consumers whenever a failure occurs. In particular, I've been forcing network partitions on a pause_minority configured cluster with a client connected to what will become the minority node, to see how things behave, and the results are a bit inconsistent.<div>
<br></div></div></blockquote></div></div></blockquote><div><br></div></div><div>How exactly are you forcing network partitions? Are you causing packet loss (using pf or iptables) or doing something else?</div></div></div>
</div></blockquote><div><br></div><div>iptables</div><div>�</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div>
<div><br></div><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div>For a simple test, I created 2 connections and 6 channels then partitioned the cluster.</div></div></blockquote></div></div></blockquote><div><br></div></div><div>How did you partition the cluster?</div>
</div></div></div></blockquote><div><br></div><div>Tweaking iptables to drop traffic to/from other nodes in the cluster.</div><div>�</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div>
<div><br><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div> Within a minute or so the minority node (to which my client is connected connected) shuts itself down.</div></div></blockquote></div></div></blockquote><div><br></div></div><div>If a RabbitMQ node decides to undertake an orderly shutdown, then all AMQP connections should be explicitly closed (as per method "a" listed above) before the network connection is severed. Where this might not work as expected, is if the network connection between client and server is unavailable and/or subject to packet loss. If the `connection.close' signal the broker sends doesn't make it to the client, then the shutdown listeners won't fire until the client's (OS) network stack detects the problem, which can take up to 30 mins depending on environment configuration.</div>
<div><div><br></div><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div>What happens next varies a bit which each test run:</div>
<div><br></div><div>Outcome 1: Immediately the shutdown listeners for my 2 connections and all 6 channels are called.</div><div><br></div></div></blockquote></div></div></blockquote><div><br></div></div><div>That is what I'd expect to happen if:</div>
<div><br></div><div>1. both connections are between the client and the broker that is shutting down</div><div>2. the network link between the broker that is shutting down and the client is in good condition (no packet loss, etc) such that the connection.close from the broker arrives at the client as expected�</div>
<div><br><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div>Outcome 2:�Immediately 2 of my 6 channels' shutdown listeners are called. None of the connection shutdown listeners are called. After waiting a few minutes I heal the partition and the shutdown listeners for the 2 connections and the remaining 4 channels are immediately called.</div>
<div><br></div></div></blockquote></div></div></blockquote><div><br></div></div><div>That doesn't sound right. If both connections are between the client and the server that is shutting down, and there is a bug in the shutdown listener handling code, then this problem would be showing up all the time (and we'd have fixed it). It is also unnecessary to consider clustering/partitions is the behaviour you describe is happening for two connection between one client and one broker.</div>
<div><br></div><div>Can you share a minimal example of the code you're using please.</div><div><br><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div>Outcome 3: Immediate 2 of my 6 channels' shutdown listeners are called. None of the connection shutdown listeners are called. After about 30 seconds, with the cluster still partitioned,�the shutdown listeners for the 2 connections and the remaining 4 channels are immediately called.<br>
<div><br></div></div></div></blockquote></div></div></blockquote><div><br></div></div><div>There are no timing guarantees about when shutdown listeners will fire. As I mentioned, these events are only triggered when either the client "sees" a `connection.close' from the broker or detects a network failure whilst listening/sending. Since both of these factors are entirely dependent on the network between client and server, and on the networking layers of the various participating operating systems, the `connection.close' and/or socket closed exception will be detected when the client's OS delivers the relevant signal to the JVM and up into the client library's application code, at which point it is handled immediately.</div>
<div><br></div><div>In both these two cases, if some channels are being used to `send' data, and the disconnection between client and server involves loss of network connectivity, then the "sending" channels are most likely to "see" IOExceptions before the "listening" channels. Modern OS networking stacks are often configured with lower retry thresholds for sending than they are for receiving, thus detection of network failures will likely vary considerably depending on what you're doing in a particular channel over a particular connection.</div>
</div></div></div></blockquote><div><br></div><div>I think you've basically hit on what I'm experiencing. The client in question is serving as a consumer only.�</div><div>�</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div>
<div><br><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div>I'm interested to learn more about when and why certain shutdown listeners might or might not be invoked so I can do a better job of re-establishing resources after a failure. Any input is appreciated.</div>
<div><br></div></div></blockquote></div></div></blockquote><div><br></div></div><div>If you can share an example of your client code, boiled down to the minimal details, that would help. Please also confirm exactly what your setup looks like, viz�</div>
</div></div></div></blockquote><div><br></div><div>I wrote a test attempting to reproduce what my actual client is experiencing, but I was only able to come close to reproducing my client's results when pushing a lot of volume down to the consumers, and even then it was not consistent enough to draw any conclusions. At this point I'm satisfied to simply tweak my client to account for potential delays in ShutdownListener calls and move on. I just wanted to be sure that there were no mechanisms introduced by amqp-client which could be causing any additional ShutdownListener delays, and it sounds like there are not.</div>
<div><br></div><div>Cheers,</div><div>Jonathan</div><div>�</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div>
<div><br></div><div>1. are both connections made between the client and exactly one server</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><div>2. how are you "partitioning" the server from the rest of the cluster</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><div>3. are you sending or receiving on the various channels that we're talking about</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div><span><font color="#888888"><div><br></div><div>Tim</div><div><br></div></font></span></div></div></div><br>_______________________________________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
<br></blockquote></div><br></div></div>