Hi, thanks for your response.<br><br>The file descriptor with the EAGAIN looks like the read end of a pipe to itself, according to lsof:<br><br><span style="font-family:courier new,monospace">beam 1754 rabbitmq 5r FIFO 0,8 0t0 10971 pipe</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">beam 1754 rabbitmq 6w FIFO 0,8 0t0 10971 pipe</span><br><br>Here is etop output. This system also runs couchdb, which may be showing up in the etop report:<br>
<br><span style="font-family:courier new,monospace">========================================================================================</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"> etop@xxxxxxxxx01 12:05:21</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"> Load: cpu 0 Memory: total 6616 binary 20</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"> procs 34 processes 1192 code 3495</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"> runq 0 atom 448 ets 249</span><br style="font-family:courier new,monospace"><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace">Pid Name or Initial Func Time Reds Memory MsgQ Current Function</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">----------------------------------------------------------------------------------------</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.3.0> erl_prim_loader '-' 8012 88288 0 erl_prim_loader:loop</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"><0.30.0> user '-' 7339 34312 0 group:server_loop/3 </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.40.0> etop_txt:init/1 '-' 2725 26272 0 etop:update/1 </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"><0.25.0> code_server '-' 1558 101024 0 code_server:loop/1 </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.29.0> user_drv '-' 92 13640 0 user_drv:server_loop</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"><0.0.0> init '-' 0 18456 0 init:boot_loop/2 </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.2.0> etop_server '-' 0 88288 0 etop:data_handler/2 </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"><0.5.0> error_logger '-' 0 5704 0 gen_event:fetch_msg/</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.6.0> application_controll '-' 0 230080 0 gen_server:loop/6 </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"><0.8.0> proc_lib:init_p/5 '-' 0 6856 0 application_master:m</span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.9.0> application_master:s '-' 0 2584 0 application_master:l</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"><0.10.0> kernel_sup '-' 0 7256 0 gen_server:loop/6 </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.11.0> rex '-' 0 2648 0 gen_server:loop/6 </span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace"><0.12.0> global_name_server '-' 0 2728 0 gen_server:loop/6 </span><br style="font-family:courier new,monospace">
<span style="font-family:courier new,monospace"><0.13.0> erlang:apply/2 '-' 0 2544 0 global:loop_the_lock</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">========================================================================================</span><br>
<br>This tight loop condition was repeatable 100% on the system above. I installed the same rpms onto another rhel6.2 system and the problem did not occur. I then deleted the contents of /var/lib/rabbitmq and re-created my single vhost + user. The tight epoll_wait loop is now gone. Before my purge, the rabbitmq process had over 500 open files similar to the following:<br>
<br>/var/lib/rabbitmq/mnesia/rabbit@<span style="font-family:courier new,monospace">xxxxxxxxx01</span>/queues/5DQOSIVFS7TNVQV0KQGVZYBSL/journal.jif<br><br>The queues directory has not yet been re-created by rabbitmq. This rabbitmq-server is part of a Chef installation. I will post back if the problem reoccurs. After the purge, the ntop output looks like this:<br>
<br>========================================================================================<br> etop@hgspadmna01 17:29:01<br> Load: cpu 0 Memory: total 6564 binary 20<br>
procs 34 processes 1141 code 3495<br> runq 0 atom 448 ets 249<br><br>Pid Name or Initial Func Time Reds Memory MsgQ Current Function<br>
----------------------------------------------------------------------------------------<br><0.30.0> user '-' 5798 16656 0 group:server_loop/3 <br><0.40.0> etop_txt:init/1 '-' 2255 29288 0 etop:update/1 <br>
<0.29.0> user_drv '-' 72 13640 0 user_drv:server_loop<br><0.0.0> init '-' 0 26352 0 init:boot_loop/2 <br><0.2.0> etop_server '-' 0 88288 0 etop:data_handler/2 <br>
<0.3.0> erl_prim_loader '-' 0 21392 0 erl_prim_loader:loop<br><0.5.0> error_logger '-' 0 5704 0 gen_event:fetch_msg/<br><0.6.0> application_controll '-' 0 230080 0 gen_server:loop/6 <br>
<0.8.0> proc_lib:init_p/5 '-' 0 6856 0 application_master:m<br><0.9.0> application_master:s '-' 0 2584 0 application_master:l<br>========================================================================================<br>
<br>Thanks!<br><br>Con<br><br><div class="gmail_quote">On Mon, Jun 18, 2012 at 4:00 AM, Emile Joubert <span dir="ltr"><<a href="mailto:emile@rabbitmq.com" target="_blank">emile@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<div class="im"><br>
On 14/06/12 22:10, Con Zyor wrote:<br>
> Running strace on the process reveals that beam is in a tight loop, with<br>
> the following sequence repeated endlessly:<br>
><br>
> read(5, "00", 32) = 2<br>
> read(5, 0x7fffc2213e40, 32) = -1 EAGAIN (Resource<br>
> temporarily unavailable)<br>
<br>
</div>This is definitely not expected behaviour. Can you determine what that<br>
file descriptor corresponds to by looking earlier in the output?<br>
What does "etop" output?<br>
How repeatable is the problem?<br>
Have you observed this on any other servers, or is it only one?<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
-Emile<br>
<br>
<br>
</font></span></blockquote></div><br>