[rabbitmq-discuss] difficulties with prefetch count [reposted]

Fri May 6 19:42:54 BST 2011

good evening;

[reposted anew on its own thread]

we find that a qos request with prefetch count of 1 does not reliably  
achieve "fair dispatch" and seek advice, how to achieve it.

the specification page[1] indicates that rmq supports local prefetch  
limits, but not global ones. the BasicQosDesign[2] wiki entry  
describes some restrictions, in particular qos/consume ordering. the  
work queue tutorial describes how to use prefetch constraints to  
achieve "fair dispatch".

despite adhering to these understandings, we observe the following in  
a running application with rmq 2.1.1 and 2.4.1:

a server process establishes four worker threads, each of which
  - creates a connection to the rmq broker
  - creates a shared work queue (which, in this case, remains unused)
  - registers the work queue to a request exchange
  - creates a private queue for responses to delegated requests  
(which, in this case, also remains unused)
  - creates two channels on its connection;
    - one channel is for task messages; there it requests qos 
(prefetch=1), consume(work queue).
    - one channel is used to delegate tasks; on this one it just  
consumed (on the private response queue).
  - accepts delivery of task messages, processes them, publishes  
results to a task-identified response queue.

a front-end process establishes equivalent threads, each of which  
supports http requests and mediates them to the server.
for each front-end request, a thread
  - creates a connection to the rmq server
  - creates a task-specific queue (as per routing) for an eventual  
response
  - subscribes to the response queue
  - publishes a task message to the request exchange with routing to  
the work queue
  - accepts delivery of task responses
  - tears down the task response subscription and queue.

in this particular situation, no delegation occurs. that is, no  
messages pass through the delegated request queue.

we observe that, if a posted task takes an "long" time, not only its  
front-end thread will wait until that processing completes, but one  
additional front-end task hangs as well.

while the long task transpires, other front-end requests are  
processed without delay. that is, their setup, request, subscription,  
delivery, and tear-down, all complete as normal. their task messages  
are delivered to one of the three unoccupied server threads which  
does the work and produces the response.

independent of whether the front-end leaves the hung task to wait for  
a response or aborts it, by canceling the subscription, deleting the  
queue, and closing the connection, once the long-running server  
thread completes its task, the next message delivered to it is that  
message from the waiting-or-aborted front-end thread.

if we use rabbitmqctl to display the connection/subscription/queue  
state during the long task processing, we observe that
- the work queue has one unacknowledged message, but zero ready messages
- the server task channels have a prefetch window of 1
- no connection has a send pending

it appears as if one single message is held up until the long task  
complete, but is no where to be seen. in order to isolate the  
problem, i enclose simple client and server implementations which can  
be used to demonstrate the problem. it is intended to be run with  
de.setf.amqp[4], but the amqp operation sequence is language- 
independent. when run with a rmq broker @2.1.1 (that is, the version  
which we have in production), one observes that each time a  
subscriber delays acknowledgment of one message, one additional  
message is delayed by being queued for delivery to that subscriber  
despite a pending unacknowledged message. this although the  
subscriber has a prefetch limit of 1 and the held message appears  
nowhere in queue lists produced by rabbitmqctl.

this can be observed in two combinations.

1. with two clients and two servers.
   a. start a server which runs without delay.
   b. start two clients

one observes, that the server receives and replies to alternating  
messages from each client

   c. start a second server, with a delay

one observes, that first one client and then the second hangs until  
the message to the first client has been acknowledged.

2. with three clients and two servers
   a. start a server which runs without delay.
   b. start three clients.

one observes, that the server receives and replies to alternating  
messages from each client in turn.

   c. start a second server, with delay

one observes,  that first one client and then a second hangs until  
the message to the first client has been acknowledged, but the third  
client's messages are delivered to the non-waiting server without delay.

that is, one gets the distinct impression that rmq does not  
consistently honor the prefetch count constraint.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fair-allocation-1-1.lisp
Type: application/applefile
Size: 584 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110506/4c9810ea/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fair-allocation-1-1.lisp
Type: application/octet-stream
Size: 6981 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110506/4c9810ea/attachment-0001.obj>
-------------- next part --------------

-------
[1] http://www.rabbitmq.com/specification.html
[2] https://dev.rabbitmq.com/wiki/BasicQosDesign
[3] http://www.rabbitmq.com/tutorials/tutorial-two-python.html
[4] https://github.com/lisp/de.setf.amqp