[rabbitmq-discuss] Option for surviving connection failure feature suggestion

Fri Jan 3 13:41:41 GMT 2014

On 5 Dec 2013, at 18:17, Wayne Brantley wrote:
> Disclaimer:  I am new to RabbitMQ.  :-)
> 

Welcome to the party! ;)

> I would propose a new option on a queue.
> 
> If the connection is dropped, that would not make the message available to another consumer.
> The message would have a 'processing timeout'.  Only after that time has expired would the message become available to other consumers.
> 
> There would be several different options that could be set on the message for how to determine to deliver the message to another consumer.
> 
> 1)  You could have an 'absolute processing timeout'.  
>     -  RabbitMQ would keep track of how long the consumer has had the message.
>     -  The client could send 'keep alive' acknowledgements to reset the the time RabbitMQ is tracking to zero.
>     -  All that matters is has the message been processing longer than the timeout value.  (where timeout can be reset by client).  A dropped connection would not matter. 
> 

Not sure I follow this properly, so to clarify, you're suggesting that the broker should track the message and if the client doesn't ACK within a specified time frame, consider the message REJECTed and give it to another client? What if the client did send an ACK within the requisite time frame, but the network was congested and therefore the ACK didn't make it to the broker in time? You'd then be in an inconsistent state. Client's already send 'keep-alive' messages for connections - AMQP heartbeats - so are you suggesting that on receipt of a heartbeat, the broker should reset this timer? If so, that would intertwine connection management with queue processing in a fiddly way, that would likely harm performance. If not, then what you're suggesting is an extension of the protocol, since AMQP doesn't have this concept at the moment. Protocol enhancements have been introduced by RabbitMQ in the past, so that's not out of the question, but I'd suggest the semantics need to be formally identified before pursuing that kind of thing.

You might be able to approximate this with multiple queues and TTL, though I'm not sure.

> 2)  You could have a 'broken connection timeout'
>     -  RabbitMQ would keep track of how long the consumer has had the message, AFTER the connection is dropped.
>     -  The client would NOT ever need to send 'keep alive' acknowledgements.
>     -  If a consumer connection is re-established - the consumer should somehow indicate they are still working on the message.
>     -  Potentially - when the consumer is working on the message, they send 'keep alive' every so often.  This way if the connection broke and was reconnected, the next time a 'keep alive' is sent, it would let RabbitMQ know you are still working on the message.
> 

What would the server do if the connection never came back? Presumably you're suggesting that the queue would consider the message still "out on loan" until the timeout expires?

> The above options would probably have default values at the queue level, and each message could override (for example if you know the message takes a long time to process.)
> 
> This makes it where any intermittent connection from consumer to publisher would not necessarily mean the consumer 'failed' and the message should be passed to another consumer.

There are a number of potential races with both (1) and (2) that could be problematic. There's also a lot of overhead for the queues to take on, just to support a fairly narrow use case - i.e., when a consumer dies you wish to wait a while before considering un-acked messages associated with that consumer ready for re-queuing. There's also the problem of "how to identify that it is the same consumer", since the AMQP specification dictates the following: "The consumer tag is local to a channel, so two clients can use the same consumer tags". Since the client will "return" on a new connection and new channel, how does the broker know that this is the same (previously handled) client of a specific queue?

Hopefully those are some useful things to think about whilst pondering how this might work. It's also worth considering whether or not there are other designs that might achieve what you want, based on existing features. We might be able to help with that too.

Cheers,
Tim