<div dir="ltr">You are using R16B01. Upgrade to R16B02 at once! R16B01 has a bug which means that async worker processes are not getting used correctly (too many processes are hashed to the wrong async worker, more or less). This severely hits disk I/O on a busy machine.<div>
<br></div><div>There are other problems with R16B01. It should be avoided if possible.</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Oct 11, 2013 at 1:29 PM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">OK, so your screenshot shows 750 queues and 753 connections. Was this from the same time as you had ~10k file descriptors in use? That sounds wrong.<br>
<br>
I think your publishing connections are going into flow control because there's a squeeze on file descriptors, which is causing the queues to have to share a small number of file descriptors between them - thus slowing them down.<br>
<br>
If you do have far more file descriptors in use than queues + connections, do you have any exotic plugins in use? What does "lsof -lnp <pid of server process>" say?<br>
<br>
Cheers, Simon<div class="HOEnZb"><div class="h5"><br>
<br>
On 11/10/2013 3:22AM, Choo wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Simon,<br>
<br>
As memory is plenty, I found that file descriptors hit the default limit,<br>
so, I bumped the limit up to 5,120, and finally to 10,240 on each nodes. It<br>
turned out that the file descriptors also touched the limit (around 10,086),<br>
and things started to go downhill.<br>
<br>
<<a href="http://rabbitmq.1065348.n5.nabble.com/file/n30402/ScreenShot.jpg" target="_blank">http://rabbitmq.1065348.n5.<u></u>nabble.com/file/n30402/<u></u>ScreenShot.jpg</a>><br>
<br>
I started processes in reverse order, by starting subscriber-side first<br>
(1:42), then the bigger publishers later (1:45). The number of published<br>
messages bounced up&down, then just after 1:48, the most of the publishers<br>
were blocked.<br>
<br>
There are more than 350 of blocked connections like below now (and file<br>
descriptors are running at 7,558 + 4,647 on 2 nodes):<br>
<a href="http://10.95.212.11:33751" target="_blank">10.95.212.11:33751</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1261.558817 flow<br>
<a href="http://10.95.212.11:33752" target="_blank">10.95.212.11:33752</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1326.324919 flow<br>
<a href="http://10.95.212.11:33753" target="_blank">10.95.212.11:33753</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1326.45322 flow<br>
<a href="http://10.95.212.11:33754" target="_blank">10.95.212.11:33754</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1278.581221 flow<br>
<a href="http://10.95.212.11:33755" target="_blank">10.95.212.11:33755</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1312.584426 flow<br>
<a href="http://10.95.212.11:33756" target="_blank">10.95.212.11:33756</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1279.623625 flow<br>
<a href="http://10.95.212.11:33757" target="_blank">10.95.212.11:33757</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1294.492535 flow<br>
<a href="http://10.95.212.11:33758" target="_blank">10.95.212.11:33758</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1276.134377 flow<br>
<a href="http://10.95.212.11:33759" target="_blank">10.95.212.11:33759</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1292.862081 flow<br>
<a href="http://10.95.212.11:33760" target="_blank">10.95.212.11:33760</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1290.695249 flow<br>
<a href="http://10.95.212.11:33761" target="_blank">10.95.212.11:33761</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1255.599642 flow<br>
<a href="http://10.95.212.11:33762" target="_blank">10.95.212.11:33762</a> -> <a href="http://10.95.212.13:5672" target="_blank">10.95.212.13:5672</a> blocked 1284.984752 flow<br>
<br>
Please kindly suggest.<br>
<br>
Thank you and Best Regards,<br>
Choo<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href="http://rabbitmq.1065348.n5.nabble.com/Connection-blocked-by-flow-for-more-than-600-seconds-tp30349p30402.html" target="_blank">http://rabbitmq.1065348.n5.<u></u>nabble.com/Connection-blocked-<u></u>by-flow-for-more-than-600-<u></u>seconds-tp30349p30402.html</a><br>
Sent from the RabbitMQ mailing list archive at Nabble.com.<br>
______________________________<u></u>_________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.<u></u>rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/<u></u>cgi-bin/mailman/listinfo/<u></u>rabbitmq-discuss</a><br>
<br>
</blockquote>
<br></div></div><span class="HOEnZb"><font color="#888888">
-- <br>
Simon MacMullen<br>
RabbitMQ, Pivotal</font></span><div class="HOEnZb"><div class="h5"><br>
______________________________<u></u>_________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.<u></u>rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/<u></u>cgi-bin/mailman/listinfo/<u></u>rabbitmq-discuss</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>J.
</div>