[rabbitmq-discuss] Erlang has closed
tsuraan at gmail.com
Thu Sep 17 16:43:53 BST 2009
> I've seen this symptom before, but the circumstances were a bit
> contrived - the way the output from uptime was being processed on OSX
> - see http://erlang.org/pipermail/erlang-questions/2008-May/034889.html.
> However, I'm not convinced that this has anything to do with what you
> are seeing.
> Can you provide us with some more details of the environment? Is it
> possible to isolate the code that can reproduce this?
It's Rabbit 1.6.0 on linux, 2.6.27 kernel. The machine has a degraded
(rebuilding) RAID5 array, so its disk IO is stuck at a max of around
6MB/s. At the time of the errors, the system load was between 8 and
15, for an 8-core machine with 4GB RAM. There's a chance that it was
also swapping, but I wasn't watching that. Its swap isn't empty, so
it certainly has swapped at some point.
The system was processing a new chunk of data, which took around 8
hours to complete. From the MQ point of view, there were probably a
few million messages enqueued (well, at least dispatched to consumers;
I realize that rabbit tries really hard not to actually "enqueue"
messages), but no more than 100,000 messages at any one time. I can
try simulating the message flows on a system with a really high load
to see if I can make the problem occur reliably, but we have rabbit on
a few dozen machines, and it doesn't look like it happens often.
I can give more details if you can think of anything more specific
that would be useful. I can't share our source code, but I'll try to
come up with a repeatable test case.
More information about the rabbitmq-discuss