[rabbitmq-discuss] rabbitmq 2.8.2 beam process consumes 100% of one cpu in tight epoll_wait loop

Mon Jun 18 18:29:46 BST 2012

Hi, thanks for your response.

The file descriptor with the EAGAIN looks like the read end of a pipe to
itself, according to lsof:

beam    1754 rabbitmq    5r     FIFO    0,8      0t0  10971 pipe
beam    1754 rabbitmq    6w     FIFO    0,8      0t0  10971 pipe

Here is etop output. This system also runs couchdb, which may be showing up
in the etop report:

========================================================================================
 etop at xxxxxxxxx01
12:05:21
 Load:  cpu         0               Memory:  total        6616
binary         20
        procs      34                        processes    1192
code         3495
        runq        0                        atom          448
ets           249

Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current
Function
----------------------------------------------------------------------------------------
<0.3.0>        erl_prim_loader          '-'    8012   88288       0
erl_prim_loader:loop
<0.30.0>       user                     '-'    7339   34312       0
group:server_loop/3
<0.40.0>       etop_txt:init/1          '-'    2725   26272       0
etop:update/1
<0.25.0>       code_server              '-'    1558  101024       0
code_server:loop/1
<0.29.0>       user_drv                 '-'      92   13640       0
user_drv:server_loop
<0.0.0>        init                     '-'       0   18456       0
init:boot_loop/2
<0.2.0>        etop_server              '-'       0   88288       0
etop:data_handler/2
<0.5.0>        error_logger             '-'       0    5704       0
gen_event:fetch_msg/
<0.6.0>        application_controll     '-'       0  230080       0
gen_server:loop/6
<0.8.0>        proc_lib:init_p/5        '-'       0    6856       0
application_master:m
<0.9.0>        application_master:s     '-'       0    2584       0
application_master:l
<0.10.0>       kernel_sup               '-'       0    7256       0
gen_server:loop/6
<0.11.0>       rex                      '-'       0    2648       0
gen_server:loop/6
<0.12.0>       global_name_server       '-'       0    2728       0
gen_server:loop/6
<0.13.0>       erlang:apply/2           '-'       0    2544       0
global:loop_the_lock
========================================================================================

This tight loop condition was repeatable 100% on the system above. I
installed the same rpms onto another rhel6.2 system and the problem did not
occur. I then deleted the contents of /var/lib/rabbitmq and re-created my
single vhost + user. The tight epoll_wait loop is now gone. Before my
purge, the rabbitmq process had over 500 open files similar to the
following:

/var/lib/rabbitmq/mnesia/rabbit at xxxxxxxxx01
/queues/5DQOSIVFS7TNVQV0KQGVZYBSL/journal.jif

The queues directory has not yet been re-created by rabbitmq. This
rabbitmq-server is part of a Chef installation. I will post back if the
problem reoccurs. After the purge, the ntop output looks like this:

========================================================================================
 etop at hgspadmna01
17:29:01
 Load:  cpu         0               Memory:  total        6564
binary         20
        procs      34                        processes    1141
code         3495
        runq        0                        atom          448
ets           249

Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current
Function
----------------------------------------------------------------------------------------
<0.30.0>       user                     '-'    5798   16656       0
group:server_loop/3
<0.40.0>       etop_txt:init/1          '-'    2255   29288       0
etop:update/1
<0.29.0>       user_drv                 '-'      72   13640       0
user_drv:server_loop
<0.0.0>        init                     '-'       0   26352       0
init:boot_loop/2
<0.2.0>        etop_server              '-'       0   88288       0
etop:data_handler/2
<0.3.0>        erl_prim_loader          '-'       0   21392       0
erl_prim_loader:loop
<0.5.0>        error_logger             '-'       0    5704       0
gen_event:fetch_msg/
<0.6.0>        application_controll     '-'       0  230080       0
gen_server:loop/6
<0.8.0>        proc_lib:init_p/5        '-'       0    6856       0
application_master:m
<0.9.0>        application_master:s     '-'       0    2584       0
application_master:l
========================================================================================

Thanks!

Con

On Mon, Jun 18, 2012 at 4:00 AM, Emile Joubert <emile at rabbitmq.com> wrote:

> Hi,
>
> On 14/06/12 22:10, Con Zyor wrote:
> > Running strace on the process reveals that beam is in a tight loop, with
> > the following sequence repeated endlessly:
> >
> > read(5, "00", 32)                       = 2
> > read(5, 0x7fffc2213e40, 32)             = -1 EAGAIN (Resource
> > temporarily unavailable)
>
> This is definitely not expected behaviour. Can you determine what that
> file descriptor corresponds to by looking earlier in the output?
> What does "etop" output?
> How repeatable is the problem?
> Have you observed this on any other servers, or is it only one?
>
>
> -Emile
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120618/2f86bbad/attachment.htm>