[rabbitmq-discuss] A weird case of "PRECONDITION_FAILED - unknown delivery tag"

Wed Aug 28 13:30:32 BST 2013

Hello guys,

I've been using RabbitMQ for a while, but this time I have an error which I 
did not manage to track down:  "PRECONDITION_FAILED - unknown delivery tag 
XXX"  (where XXX varies).

Now, I've done my research and I know it's usually because of double 
ack-ing, ack-ing on wrong channels or ack-ing messages that should not be 
ack-ed. I've checked at least a dozen times, and it doesn't seem to be the 
case for me. 

I've simplified the scenario to the minimum required to reproduce the error 
(I'm running RabbitMQ 3.0.3 Erlang R15B01; the code is written in python 
using pika 0.9.10p0). 

I have one working agent which has two threads:
 - thread one that fetches tasks continuously and puts them in a list
 - a worker thread that continuously takes tasks from the list and performs 
them

See code here: http://pastebin.com/PG8quVSw

Here's what happens (see example output here: http://pastebin.com/6f1sWsYa ): 
the agent starts by initializing, pre-fetches 10 tasks (the prefech_count 
is set to 10) and then tasks are executed one by one. We can see that as 
soon as one task is ACK-ed, a new one is fetched so that the prefetch queue 
is staying at 10 items. However, as we can see at line 52 in the example 
output, the error message (406, 'PRECONDITION_FAILED - unknown delivery tag 
12') is returned. The channel is closed, and when the next 10 tasks from 
the prefetch queue try to send the ACK they get the error that the channel 
is closed. The tasks are prefetched again on the new channel that was 
opened automatically and then everything continues fine till the end (there 
were 60 tasks in total in the queue). 

THE PROBLEM: why did ACK 12 failed?

I managed to reproduce this consistently, but always at at different 
message, sometimes 42, 34 etc. One weird thing is that the error about the 
unknown delivery tag is not returned on the call to basic_ack (line 13 in 
the code), but on the code that consumes the queue (line 45). My guess is 
that there's a race somewhere, but can't figure where. In pika maybe? If I 
uncomment lines 34 and 35 from the code, it happens a lot less, but it 
still happens 1-2 times on 1000 messages. I think it has something to do 
with ACK-ing from one thread using the same channel that is used for 
listening on another thread. But I see no other way of implementing this 
scenario, with an internal pre-fetch queue on the consumer side. And to 
answer the question, why do I need this, it's because I need the worker to 
be able to take a peak at some of the tasks in the queues.

Any ideas? 

Thanks,
Raz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130828/f8f0c91a/attachment.htm>