[rabbitmq-discuss] Occasional slow message on a local machine
Brennan Sellner
bsellner at seegrid.com
Mon Jul 23 20:57:24 BST 2012
Hi all,
We're seeing the occasional slow transmission of a message (where 'slow'
is defined as ~2.8 seconds between transmission and receipt). It
doesn't happen that often (perhaps once a week or so), but when it does
it triggers some of our failsafe code.
We're using librabbitmq-c in a C++ program, and the message in question
is being sent from one thread in a process to another thread in the same
process (we're in the midst of refactoring; they'll eventually be
separate processes on separate machines). Each thread establishes and
maintains its own isolated AMQP connection.
The consumer thread is consuming from a single durable queue, bound to a
durable topic exchange with the wildcard binding. The producer is
publishing non-persistent messages (and expecting a reply via a separate
reply exchange, but the problem occurs before we get to that point).
The flow of events is thus:
Time -0.003: The consumer receives, processes, and replies to an
unrelated message. It then waits for a frame by calling
amqp_simple_wait_frame.
Time 0: The producer transmits a message via amqp_basic_publish. Our
logging bounds the time for the function to return at <= 38 microseconds.
Time 0.90: The consumer receives and responds to an AMQP heartbeat from
the server, and acks the delivery of a prior message. It again drops
into an amqp_simple_wait_frame.
Time 1.90: The consumer handles another AMQP heartbeat in the same fashion.
Time 2.74: The consumer (finally!) sees the deliver frame for the
producer's message, followed shortly by the header and body frames. All
frames arrive within a 364 microsecond window.
The rabbitmq server (v2.8.2) is running on the same machine, so there
aren't any external networks involved. We're still running Fedora 11
(kernel 2.6.29, Erlang R12B-5.8) for the moment, if that's relevant.
The machine itself is somewhat old (Core 2 Duo @ 2.66 GHz, 2 GB RAM,
single traditional HDD), but it doesn't seem like we're CPU or
RAM-bound, though I haven't been able to capture a resource snapshot at
the time of the error. We have seen occasional slow disk I/O in other
parts of the system. Message load is extremely low at the moment: on
the order of 100 messages / second, none of them larger than a single
body frame.
The only things in the Rabbit logs around then are accepting/closing
AMQP connection notices (8 are opened and 6 are closed in approximately
the right time range).
I'm baffled: the amqp_basic_publish call doesn't hang, there's no
network to induce latency of any sort, the consumer's sitting in a
listen, and the server's clearly talking to the consumer!
My understanding is that when dealing solely with non-persistent
messages, Rabbit won't hit the disk until it runs out of RAM. Is that
accurate? Any ideas as to what else might cause this sort of delay, or
suggestions on how to get further useful information?
Thanks,
-Brennan
More information about the rabbitmq-discuss
mailing list