[rabbitmq-discuss] rabbitmq 2.8.2 beam process consumes 100% of one cpu in tight epoll_wait loop
Con Zyor
conzyor34 at gmail.com
Mon Jun 18 18:29:46 BST 2012
Hi, thanks for your response.
The file descriptor with the EAGAIN looks like the read end of a pipe to
itself, according to lsof:
beam 1754 rabbitmq 5r FIFO 0,8 0t0 10971 pipe
beam 1754 rabbitmq 6w FIFO 0,8 0t0 10971 pipe
Here is etop output. This system also runs couchdb, which may be showing up
in the etop report:
========================================================================================
etop at xxxxxxxxx01
12:05:21
Load: cpu 0 Memory: total 6616
binary 20
procs 34 processes 1192
code 3495
runq 0 atom 448
ets 249
Pid Name or Initial Func Time Reds Memory MsgQ Current
Function
----------------------------------------------------------------------------------------
<0.3.0> erl_prim_loader '-' 8012 88288 0
erl_prim_loader:loop
<0.30.0> user '-' 7339 34312 0
group:server_loop/3
<0.40.0> etop_txt:init/1 '-' 2725 26272 0
etop:update/1
<0.25.0> code_server '-' 1558 101024 0
code_server:loop/1
<0.29.0> user_drv '-' 92 13640 0
user_drv:server_loop
<0.0.0> init '-' 0 18456 0
init:boot_loop/2
<0.2.0> etop_server '-' 0 88288 0
etop:data_handler/2
<0.5.0> error_logger '-' 0 5704 0
gen_event:fetch_msg/
<0.6.0> application_controll '-' 0 230080 0
gen_server:loop/6
<0.8.0> proc_lib:init_p/5 '-' 0 6856 0
application_master:m
<0.9.0> application_master:s '-' 0 2584 0
application_master:l
<0.10.0> kernel_sup '-' 0 7256 0
gen_server:loop/6
<0.11.0> rex '-' 0 2648 0
gen_server:loop/6
<0.12.0> global_name_server '-' 0 2728 0
gen_server:loop/6
<0.13.0> erlang:apply/2 '-' 0 2544 0
global:loop_the_lock
========================================================================================
This tight loop condition was repeatable 100% on the system above. I
installed the same rpms onto another rhel6.2 system and the problem did not
occur. I then deleted the contents of /var/lib/rabbitmq and re-created my
single vhost + user. The tight epoll_wait loop is now gone. Before my
purge, the rabbitmq process had over 500 open files similar to the
following:
/var/lib/rabbitmq/mnesia/rabbit at xxxxxxxxx01
/queues/5DQOSIVFS7TNVQV0KQGVZYBSL/journal.jif
The queues directory has not yet been re-created by rabbitmq. This
rabbitmq-server is part of a Chef installation. I will post back if the
problem reoccurs. After the purge, the ntop output looks like this:
========================================================================================
etop at hgspadmna01
17:29:01
Load: cpu 0 Memory: total 6564
binary 20
procs 34 processes 1141
code 3495
runq 0 atom 448
ets 249
Pid Name or Initial Func Time Reds Memory MsgQ Current
Function
----------------------------------------------------------------------------------------
<0.30.0> user '-' 5798 16656 0
group:server_loop/3
<0.40.0> etop_txt:init/1 '-' 2255 29288 0
etop:update/1
<0.29.0> user_drv '-' 72 13640 0
user_drv:server_loop
<0.0.0> init '-' 0 26352 0
init:boot_loop/2
<0.2.0> etop_server '-' 0 88288 0
etop:data_handler/2
<0.3.0> erl_prim_loader '-' 0 21392 0
erl_prim_loader:loop
<0.5.0> error_logger '-' 0 5704 0
gen_event:fetch_msg/
<0.6.0> application_controll '-' 0 230080 0
gen_server:loop/6
<0.8.0> proc_lib:init_p/5 '-' 0 6856 0
application_master:m
<0.9.0> application_master:s '-' 0 2584 0
application_master:l
========================================================================================
Thanks!
Con
On Mon, Jun 18, 2012 at 4:00 AM, Emile Joubert <emile at rabbitmq.com> wrote:
> Hi,
>
> On 14/06/12 22:10, Con Zyor wrote:
> > Running strace on the process reveals that beam is in a tight loop, with
> > the following sequence repeated endlessly:
> >
> > read(5, "00", 32) = 2
> > read(5, 0x7fffc2213e40, 32) = -1 EAGAIN (Resource
> > temporarily unavailable)
>
> This is definitely not expected behaviour. Can you determine what that
> file descriptor corresponds to by looking earlier in the output?
> What does "etop" output?
> How repeatable is the problem?
> Have you observed this on any other servers, or is it only one?
>
>
> -Emile
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120618/2f86bbad/attachment.htm>
More information about the rabbitmq-discuss
mailing list