[rabbitmq-discuss] Recovery after heartbeat error

Tim Watson tim at rabbitmq.com
Thu Mar 7 13:59:32 GMT 2013


And .... looking at the title of the post (I got lost in the details and forgot) ....

If a heartbeat error/timeout has occurred, then definitely just throw away all the channels for the connection - they're done and attempting to close them is the wrong thing to do in that situation.

Cheers,
Tim 

On 7 Mar 2013, at 13:58, Tim Watson wrote:

> On 7 Mar 2013, at 13:28, Ioannis Foukarakis wrote:
>> [snip]
>> However, if I disconnect from the network, channel.close() blocks.
> 
> Yes, this is *normal* behaviour for a networked application... When you pull the network cable (or take whatever equivalent action disconnects from the network) then the AMQP channel.close protocol has a slight problem. The client sends 'close' and waits for the server to respond with 'close-ok' - but if the server is gone, what then!? Usually the client then blocks until the operating system 'notices' that the peer socket is gone (because transmission retry limits have been exhausted, tcp keep-alives are enabled, etc). There are various conditions that determine this 'delay' and it can take up to 30 mins with some operating systems' default TCP configuration, before the application is notified with ETIMEDOUT. This exception occurs on the connection object, i.e., when the network failure is finally noticed it forces the connection to close. The channels will also be closed by the client (when it notices that the connection has been terminated).
> 
>> Should I avoid trying to close channels? Is there a better solution for handling network errors?
>> 
> 
> Yes, if you knew for certain there has been a network failure, then you should skip closing channels. The shutdown signal means that the 'other end' has closed so calling a synchronous AMQP method like channel.close after than is always going to block.
> 
> If you have a very unreliable network then just rely on the heartbeats to terminate the connection (and therefore the channels) in a timely fashion. If you *need* to close channels more often than that, then you might need to put some kind of timeout around the channel.close() call.
> 
> Cheers,
> Tim



More information about the rabbitmq-discuss mailing list