Hi, thanks for your response.<br><br>The file descriptor with the EAGAIN looks like the read end of a pipe to itself, according to lsof:<br><br><span style="font-family:courier new,monospace">beam    1754 rabbitmq    5r     FIFO    0,8      0t0  10971 pipe</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">beam    1754 rabbitmq    6w     FIFO    0,8      0t0  10971 pipe</span><br><br>Here is etop output. This system also runs couchdb, which may be showing up in the etop report:<br>
<br><span style="font-family:courier new,monospace">========================================================================================</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"> etop@xxxxxxxxx01                                                          12:05:21</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"> Load:  cpu         0               Memory:  total        6616    binary         20</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">        procs      34                        processes    1192    code         3495</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">        runq        0                        atom          448    ets           249</span><br style="font-family:courier new,monospace"><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current Function</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">----------------------------------------------------------------------------------------</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.3.0&gt;        erl_prim_loader          &#39;-&#39;    8012   88288       0 erl_prim_loader:loop</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">&lt;0.30.0&gt;       user                     &#39;-&#39;    7339   34312       0 group:server_loop/3 </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.40.0&gt;       etop_txt:init/1          &#39;-&#39;    2725   26272       0 etop:update/1       </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">&lt;0.25.0&gt;       code_server              &#39;-&#39;    1558  101024       0 code_server:loop/1  </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.29.0&gt;       user_drv                 &#39;-&#39;      92   13640       0 user_drv:server_loop</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">&lt;0.0.0&gt;        init                     &#39;-&#39;       0   18456       0 init:boot_loop/2    </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.2.0&gt;        etop_server              &#39;-&#39;       0   88288       0 etop:data_handler/2 </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">&lt;0.5.0&gt;        error_logger             &#39;-&#39;       0    5704       0 gen_event:fetch_msg/</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.6.0&gt;        application_controll     &#39;-&#39;       0  230080       0 gen_server:loop/6   </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">&lt;0.8.0&gt;        proc_lib:init_p/5        &#39;-&#39;       0    6856       0 application_master:m</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.9.0&gt;        application_master:s     &#39;-&#39;       0    2584       0 application_master:l</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">&lt;0.10.0&gt;       kernel_sup               &#39;-&#39;       0    7256       0 gen_server:loop/6   </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.11.0&gt;       rex                      &#39;-&#39;       0    2648       0 gen_server:loop/6   </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">&lt;0.12.0&gt;       global_name_server       &#39;-&#39;       0    2728       0 gen_server:loop/6   </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">&lt;0.13.0&gt;       erlang:apply/2           &#39;-&#39;       0    2544       0 global:loop_the_lock</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">========================================================================================</span><br>
<br>This tight loop condition was repeatable 100% on the system above. I installed the same rpms onto another rhel6.2 system and the problem did not occur. I then deleted the contents of /var/lib/rabbitmq and re-created my single vhost + user. The tight epoll_wait loop is now gone. Before my purge, the rabbitmq process had over 500 open files similar to the following:<br>
<br>/var/lib/rabbitmq/mnesia/rabbit@<span style="font-family:courier new,monospace">xxxxxxxxx01</span>/queues/5DQOSIVFS7TNVQV0KQGVZYBSL/journal.jif<br><br>The queues directory has not yet been re-created by rabbitmq. This rabbitmq-server is part of a Chef installation. I will post back if the problem reoccurs. After the purge, the ntop output looks like this:<br>
<br>========================================================================================<br> etop@hgspadmna01                                                          17:29:01<br> Load:  cpu         0               Memory:  total        6564    binary         20<br>
        procs      34                        processes    1141    code         3495<br>        runq        0                        atom          448    ets           249<br><br>Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current Function<br>
----------------------------------------------------------------------------------------<br>&lt;0.30.0&gt;       user                     &#39;-&#39;    5798   16656       0 group:server_loop/3 <br>&lt;0.40.0&gt;       etop_txt:init/1          &#39;-&#39;    2255   29288       0 etop:update/1       <br>
&lt;0.29.0&gt;       user_drv                 &#39;-&#39;      72   13640       0 user_drv:server_loop<br>&lt;0.0.0&gt;        init                     &#39;-&#39;       0   26352       0 init:boot_loop/2    <br>&lt;0.2.0&gt;        etop_server              &#39;-&#39;       0   88288       0 etop:data_handler/2 <br>
&lt;0.3.0&gt;        erl_prim_loader          &#39;-&#39;       0   21392       0 erl_prim_loader:loop<br>&lt;0.5.0&gt;        error_logger             &#39;-&#39;       0    5704       0 gen_event:fetch_msg/<br>&lt;0.6.0&gt;        application_controll     &#39;-&#39;       0  230080       0 gen_server:loop/6   <br>
&lt;0.8.0&gt;        proc_lib:init_p/5        &#39;-&#39;       0    6856       0 application_master:m<br>&lt;0.9.0&gt;        application_master:s     &#39;-&#39;       0    2584       0 application_master:l<br>========================================================================================<br>
<br>Thanks!<br><br>Con<br><br><div class="gmail_quote">On Mon, Jun 18, 2012 at 4:00 AM, Emile Joubert <span dir="ltr">&lt;<a href="mailto:emile@rabbitmq.com" target="_blank">emile@rabbitmq.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<div class="im"><br>
On 14/06/12 22:10, Con Zyor wrote:<br>
&gt; Running strace on the process reveals that beam is in a tight loop, with<br>
&gt; the following sequence repeated endlessly:<br>
&gt;<br>
&gt; read(5, &quot;00&quot;, 32)                       = 2<br>
&gt; read(5, 0x7fffc2213e40, 32)             = -1 EAGAIN (Resource<br>
&gt; temporarily unavailable)<br>
<br>
</div>This is definitely not expected behaviour. Can you determine what that<br>
file descriptor corresponds to by looking earlier in the output?<br>
What does &quot;etop&quot; output?<br>
How repeatable is the problem?<br>
Have you observed this on any other servers, or is it only one?<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
-Emile<br>
<br>
<br>
</font></span></blockquote></div><br>