[rabbitmq-discuss] Occasional slow message on a local machine

Brennan Sellner bsellner at seegrid.com
Fri Sep 14 21:07:54 BST 2012

On 09/10/2012 03:52 PM, Matthias Radestock wrote:
> On 10/09/12 20:42, Brennan Sellner wrote:
>> Well, there goes that theory.  Thanks for the info!
>> I'll see if I can reproduce this in a more controlled environment.  If
>> anyone has further ideas in the meantime, I'm all ears.
> Set up some syscall tracing on the rabbit process to see what it's up to
> when you encounter the delay. That would reveal any unexpected fsync's,
> for example.

Short version

It appears that cleaning up a queue (e.g. auto-delete on channel close) 
requires Rabbit to fsync, even if the queue is non-durable, contains 
only non-persistent messages, and is bound to a non-durable exchange.  A 
portion of our system is still using a query-response architecture, so 
short-lived queues are quite common.  Is there any way to declare a 
queue such that Rabbit doesn't touch the disk at any point in the 
queue's life cycle?

Long version / background

All righty, I think I have a root cause.  In a previous thread 
we reported that amqp_channel_close was occasionally taking a long time. 
  As a result of that conversation, we implemented an alternate version 
that didn't wait for close-ok, and threw away the close-ok frame when it 
eventually arrived.

It appears that the long-running cleanup done in response to 
amqp_channel_close is at fault here, and is attributed to one of the 
subsequent calls, with the specific one dependent on when the fsync falls.

I've implemented a pair of self-contained test programs that replicate 
the problem, using only librabbitmq-c (the source is available if anyone 
wants it):

   - Declare a queue and bind it to testSendExchange
   - Consume from the queue infinitely
   - Send a reply to each arriving message
     - To the default exchange
     - Route with the reply-to header of the received message

   - Loops infinitely:
     - Open a channel
     - Declare a server-named queue
     - Consume from it
     - Send a message to testSendExchange, filling in reply-to
     - Wait for the method, header, and body frames of the reply
     - Close the channel *asynchronously*
     - Increment the channel number to be used

amqp-test-send's loop runs at 80 - 220 Hz, depending on system load.

If these are run on the same host as Rabbit, and we hit the hard drive 
hard (e.g. continuously copy large files), we see a slow (0.5 - 7 
seconds) response every few seconds.

If we instead close the channel synchronously in amqp-test-send, the 
slow responses are exclusively associated with amqp_channel_close.

I've experimented with a variety of parameters, and it's pretty 
conclusive that the problem is with cleaning up the server-named reply 
queues.  If I declare them as non-auto-delete, Rabbit doesn't hit the 
disk, and we don't see any timeouts.  However, the queues accumulate, 
and when amqp-test-send is killed, Rabbit is hugely slowed while 
performing all the cleanup.  Ditto if I operate in a single channel, 
rather than closing and recreating the channel on each loop.

So.  The question is: is it possible to declare a queue that's entirely 
transitory, and doesn't result in Rabbit having to hit the disk at any 
point in the queue's life cycle?  I'm currently declaring non-passive, 
non-durable, exclusive, auto-delete queues.  Switching to non-exclusive 
has no effect.  It really seems like there should be *some* way to have 
a lightweight, ephemeral queue...



More information about the rabbitmq-discuss mailing list