[rabbitmq-discuss] Occasional slow message on a local machine

Mon Jul 23 20:57:24 BST 2012

Hi all,

We're seeing the occasional slow transmission of a message (where 'slow' 
is defined as ~2.8 seconds between transmission and receipt).  It 
doesn't happen that often (perhaps once a week or so), but when it does 
it triggers some of our failsafe code.

We're using librabbitmq-c in a C++ program, and the message in question 
is being sent from one thread in a process to another thread in the same 
process (we're in the midst of refactoring; they'll eventually be 
separate processes on separate machines).  Each thread establishes and 
maintains its own isolated AMQP connection.

The consumer thread is consuming from a single durable queue, bound to a 
durable topic exchange with the wildcard binding.  The producer is 
publishing non-persistent messages (and expecting a reply via a separate 
reply exchange, but the problem occurs before we get to that point).

The flow of events is thus:

Time -0.003: The consumer receives, processes, and replies to an 
unrelated message.  It then waits for a frame by calling 
amqp_simple_wait_frame.

Time 0: The producer transmits a message via amqp_basic_publish.  Our 
logging bounds the time for the function to return at <= 38 microseconds.

Time 0.90: The consumer receives and responds to an AMQP heartbeat from 
the server, and acks the delivery of a prior message.  It again drops 
into an amqp_simple_wait_frame.

Time 1.90: The consumer handles another AMQP heartbeat in the same fashion.

Time 2.74: The consumer (finally!) sees the deliver frame for the 
producer's message, followed shortly by the header and body frames.  All 
frames arrive within a 364 microsecond window.

The rabbitmq server (v2.8.2) is running on the same machine, so there 
aren't any external networks involved.  We're still running Fedora 11 
(kernel 2.6.29, Erlang R12B-5.8) for the moment, if that's relevant.

The machine itself is somewhat old (Core 2 Duo @ 2.66 GHz, 2 GB RAM, 
single traditional HDD), but it doesn't seem like we're CPU or 
RAM-bound, though I haven't been able to capture a resource snapshot at 
the time of the error.  We have seen occasional slow disk I/O in other 
parts of the system.  Message load is extremely low at the moment: on 
the order of 100 messages / second, none of them larger than a single 
body frame.

The only things in the Rabbit logs around then are accepting/closing 
AMQP connection notices (8 are opened and 6 are closed in approximately 
the right time range).

I'm baffled: the amqp_basic_publish call doesn't hang, there's no 
network to induce latency of any sort, the consumer's sitting in a 
listen, and the server's clearly talking to the consumer!

My understanding is that when dealing solely with non-persistent 
messages, Rabbit won't hit the disk until it runs out of RAM.  Is that 
accurate?  Any ideas as to what else might cause this sort of delay, or 
suggestions on how to get further useful information?

Thanks,

-Brennan