[rabbitmq-discuss] Queue delete causes transaction errors

Wed Jul 14 16:31:12 BST 2010

Hi Aaron,

Thanks for the follow-up.

> It didn't occur to me that the failure could have arisen after the
> routing but during the consumption stage of my passive listener.  I
> don't think that's the case, but here's what the broker reported which
> I intended to include in my first email:
> 
>   class_id: 90
>   method_id: 20
>   reply_code: 541
>   reply_text: "INTERNAL_ERROR - commit failed:
> [{<0.25498.0>,{exit,{queue_disappeared,<0.25498.0>},[{rabbit_amqqueue,'-commit_all/3-fun-0-',1},{delegate,safe_invoke,2},{delegate,'-safe_invoke/2-lc$^0/1-0-',2},{delegate,'-safe_invoke/2-lc$^0/1-0-',2},{delegate,delegat..."

Right, yes it looks like the queue has disappeared in-between routing 
the message and committing the transaction.

I'm puzzled about why this would happen when you are not deleting the 
queue, though.  Are you declaring the queues as exclusive?  That would 
delete them with the connection.

(The proposed "new" semantics for transactions would obviate the 
queue-going-away problem, by the way)

>>> I see that 0.9.1 of the spec adds queue.unbind().  Is that the only
>>> way to avoid this problem, or is there another approach that we can
>>> take?
>> I don't see how queue.unbind would help -- would you explain?
> 
> I imagine that I could unbind my listener's queue, wait for a bit,
> then safely delete the queue and the broker would no longer be trying
> to route to it.

There would still be a race between the delete and the transaction 
committing, unless those are co-ordinated somehow (which I wouldn't have 
thought).  I.e., if you publish then commit in one process, and unbind 
then delete in another other process, the publish and commit could still 
fall either side of the unbind and delete.

Michael

> On Tue, Jul 13, 2010 at 11:29 AM, Michael Bridgen <mikeb at rabbitmq.com> wrote:
>> Hi Aaron,
>>
>>> I've confirmed that transactions are being aborted due to a queue
>>> being deleted using an easily-reproducible test case.
>>>
>>> The base-case is an exchange and a 1-1 mapping of a routing key to a
>>> destination queue.  The client is simply consuming a published message
>>> and then publishing the next one.  A single client as a consumer is
>>> enough to repeat the bug.
>>>
>>> We then have a test client which we can use to attach to an exchange
>>> using a supplied routing key.  It will create its own queue and then
>>> act as a passive listener, for easy monitoring of traffic.  The queue
>>> it uses is set to auto-delete. In high-traffic situations, I would
>>> occasionally see a transaction error in the client.
>>>
>>> I setup a test case today where a listener would open a connection and
>>> queue, listen for 2 seconds, then disconnect.  I tried combinations of
>>> auto_delete enabled and disabled, both with and without an explicit
>>> queue delete call, as well as using transactions and re-using a
>>> connection versus closing and reconnecting.  I would run this test
>>> listener while a test client simply published a message to itself
>>> every time it received one.  The client is using transactions,
>>> committing after each publish call.  Within a few minutes, no matter
>>> how my listener was configured, the client would receive a transaction
>>> error.  I repeated this with the 1.8.0 release.
>> It looks like there's two races going on here:
>>  - the queue being autodeleted, and the transaction committing; and
>>  - the connection dropping, and the transaction committing.
>>
>> In the first case, the transaction commit fails because the queue has gone
>> away and it can no longer route the message to it.  I'm less certain about
>> the second; I think it may be because the queue tries to deliver the message
>> on tx.commit, and the connection drops while that's happening.
>>
>> The AMQP spec doesn't say a lot about the properties of transactions, and in
>> particular, whether routing "happens" before or after tx.commit.  RabbitMQ
>> routes before the tx.commit, mainly so that persistent messages will land on
>> disk.
>>
>> It would be well within the spec to *act* as though routing happened after
>> tx.commit; e.g., the transaction wouldn't fail because your autodelete queue
>> has gone away.  We'd also have to be careful of the second case, that
>> failing to deliver the message didn't cause the tx.commit to fail.  That's
>> probably a more useful semantics overall, anyway.
>>
>> (We actually already have a bug for looking into this -- thanks for bringing
>> back to our attention!)
>>
>>> I see that 0.9.1 of the spec adds queue.unbind().  Is that the only
>>> way to avoid this problem, or is there another approach that we can
>>> take?
>> I don't see how queue.unbind would help -- would you explain?
>>
>>
>> Cheers,
>> Michael
>>
> 
> 
>