<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-2022-JP">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Emile,<br>
<br>
I'm one of Anthony's Colleagues here at LShift, who assisted in the
initial investigation.<br>
<br>
<blockquote>
<pre>Thanks for this very detailed diagnostic information. Could you perhaps
supply the output of "rabbitmqctl report" as well? Feel free to compress
and send to me directly if it is too large.
</pre>
</blockquote>
Size of the report isn't the main issue in this case--sadly, it
looks like even the process of getting a queue listing wasn't going
to terminate any time soon (I gave up after ~10 minutes). I've
attached the output from a broken rabbit (although that only lists "<tt>rabbitmqctl
status</tt>") and a report from the same rabbit with an increased
file descriptor limit. <br>
<blockquote>
<pre>
I have not been able to reproduce the problem - does this happen each
time you start the cluster? does stopping and starting the slave nodes
make any difference? Have you discovered a sequence of actions that is
likely to reproduce the problem?
</pre>
</blockquote>
In the example I'm seeing (on our internal build slaves that the
earlier etop results were derived from), we have about 1400 queues
left over from a set of rather unhygenic integration tests, so I'd
infer that there's some kind of pathological interaction between the
mnesia queue configuration database resource management and the <tt>file_handle_cache</tt>
module. So, it does indeed happen every time we restart the single
instance (on our CI machines, they're single instances). <br>
<br>
On this hunch, I've bumped our file descriptor limit on one machine
to 8192 from 1024 files, and it does indeed that the resource usage
does drop dramatically once the instance has started up. So, I think
that brings us to a work around for one case, if not an explanation
of the underlying behaviour.<br>
</body>
</html>