[rabbitmq-discuss] Rabbitmq crashing every 90 minutes

Dan_b daniel.bason at telogis.com
Wed Jan 22 21:13:31 GMT 2014


Hi,

For the last 6 hours one of our rabbitmq instances has been crashing every
90 minutes with the ram usage spiking up after the crash.  Restarting rabbit
fixes the issue but then it is another 90 minutes and it crashes again.  I
have tried resetting the rabbit instance completely (renaming the
/var/lib/rabbitmq/mnesia database and letting it be recreated) and am
waiting to see if this has resolved the issue.  It would be nice if we would
work out what caused this though.

Rabbitmq status is as follows:
 {running_applications,[{rabbit,"RabbitMQ","3.1.5"},
                        {ssl,"Erlang/OTP SSL application","4.1.6"},
                        {public_key,"Public key infrastructure","0.13"},
                        {crypto,"CRYPTO version 2","2.0.4"},
                        {asn1,"The Erlang ASN1 compiler version 1.6.18",
                              "1.6.18"},
                        {mnesia,"MNESIA  CXC 138 12","4.5"},
                        {os_mon,"CPO  CXC 138 46","2.2.7"},
                        {xmerl,"XML parser","1.2.10"},
                        {sasl,"SASL  CXC 138 11","2.1.10"},
                        {stdlib,"ERTS  CXC 138 10","1.17.5"},
                        {kernel,"ERTS  CXC 138 10","2.14.5"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:4:4]
[rq:4] [async-threads:30] [kernel-poll:true]\n"},
 {memory,[{total,57151960},
          {connection_procs,4273224},
          {queue_procs,5370024},
          {plugins,0},
          {other_proc,9627768},
          {mnesia,182848},
          {mgmt_db,0},
          {msg_index,836424},
          {other_ets,1219792},
          {binary,17570544},
          {code,14611134},
          {atom,1362545},
          {other_system,2097657}]},
 {vm_memory_high_watermark,0.8},
 {vm_memory_limit,6871947673},
 {disk_free_limit,1000000000},
 {disk_free,10916909056},
 {file_descriptors,[{total_limit,924},
                    {total_used,66},
                    {sockets_limit,829},
                    {sockets_used,46}]},
 {processes,[{limit,1048576},{used,654}]},
 {run_queue,0},
 {uptime,2820}]
...done.

Output from the sasl log is:
=CRASH REPORT==== 22-Jan-2014::19:37:28 ===
  crasher:
    initial call: tcp_acceptor:init/1
    pid: <0.232.0>
    registered_name: []
    exception exit: {accept_failed,enfile}
      in function  gen_server:terminate/6
    ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
    messages: []
    links: [<0.230.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 24
    reductions: 1442919
  neighbours:

=SUPERVISOR REPORT==== 22-Jan-2014::19:37:28 ===
     Supervisor: {local,'tcp_acceptor_sup_:::5672'}
     Context:    child_terminated
     Reason:     {accept_failed,enfile}
     Offender:   [{pid,<0.232.0>},
                  {name,tcp_acceptor},
                  {mfargs,
                      {tcp_acceptor,start_link,
                          [{rabbit_networking,start_client,[]},
                           #Port<0.4998>]}},
                  {restart_type,transient},
                  {shutdown,brutal_kill},
                  {child_type,worker}]


=CRASH REPORT==== 22-Jan-2014::19:37:30 ===
  crasher:
    initial call: tcp_acceptor:init/1
    pid: <0.10514.3>
    registered_name: []
    exception exit: {accept_failed,enfile}
      in function  gen_server:terminate/6
    ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
    messages: []
    links: [<0.230.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 233
    stack_size: 24
    reductions: 148
  neighbours:

=SUPERVISOR REPORT==== 22-Jan-2014::19:37:30 ===
     Supervisor: {local,'tcp_acceptor_sup_:::5672'}
     Context:    child_terminated
     Reason:     {accept_failed,enfile}
     Offender:   [{pid,<0.10514.3>},
                  {name,tcp_acceptor},
                  {mfargs,
                      {tcp_acceptor,start_link,
                          [{rabbit_networking,start_client,[]},
                           #Port<0.4998>]}},
                  {restart_type,transient},
                  {shutdown,brutal_kill},
                  {child_type,worker}]


=CRASH REPORT==== 22-Jan-2014::19:37:31 ===
  crasher:
    initial call: tcp_acceptor:init/1
    pid: <0.10522.3>
    registered_name: []
    exception exit: {accept_failed,enfile}
      in function  gen_server:terminate/6
    ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
    messages: []
    links: [<0.230.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 233
    stack_size: 24
    reductions: 148
  neighbours:

=SUPERVISOR REPORT==== 22-Jan-2014::19:37:31 ===
     Supervisor: {local,'tcp_acceptor_sup_:::5672'}
     Context:    child_terminated
     Reason:     {accept_failed,enfile}
     Offender:   [{pid,<0.10522.3>},
                  {name,tcp_acceptor},
                  {mfargs,
                      {tcp_acceptor,start_link,
                          [{rabbit_networking,start_client,[]},
                           #Port<0.4998>]}},
                  {restart_type,transient},
                  {shutdown,brutal_kill},
                  {child_type,worker}]


=CRASH REPORT==== 22-Jan-2014::19:37:32 ===
  crasher:
    initial call: tcp_acceptor:init/1
    pid: <0.10523.3>
    registered_name: []
    exception exit: {accept_failed,enfile}
      in function  gen_server:terminate/6
    ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
    messages: []
    links: [<0.230.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 233
                 
=CRASH REPORT==== 22-Jan-2014::19:37:37 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.156.0>
    registered_name: rabbit_disk_monitor
    exception exit: {system_limit,
                        [{erlang,open_port,
                             [{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
                         {os,start_port_srv_handle,1},
                         {os,start_port_srv_loop,0}]}
      in function  gen_server:terminate/6
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.98.0>]
    messages: []
    links: [<0.79.0>,<0.155.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1597
    stack_size: 24
    reductions: 964775
  neighbours:

=SUPERVISOR REPORT==== 22-Jan-2014::19:37:37 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    child_terminated
     Reason:     {system_limit,
                     [{erlang,open_port,
                          [{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
                      {os,start_port_srv_handle,1},
                      {os,start_port_srv_loop,0}]}
     Offender:   [{pid,<0.156.0>},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[1000000000]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]


=CRASH REPORT==== 22-Jan-2014::19:37:37 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.10529.3>
    registered_name: []
    exception exit: {{badmatch,{error,system_limit}},
                     [{vm_memory_monitor,read_proc_file,1},
                      {vm_memory_monitor,get_total_memory,1},
                      {rabbit_disk_monitor,init,1},
                      {gen_server,init_it,6},
                      {proc_lib,init_p_do_apply,3}]}
      in function  gen_server:init_it/6
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.98.0>]
    messages: []
    links: [<0.155.0>,<0.79.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 482
  neighbours:

And so on.

I can compress and send the actual logs to someone if this will help.  We
also have the erlang crashdump from the issue.



--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Rabbitmq-crashing-every-90-minutes-tp32798.html
Sent from the RabbitMQ mailing list archive at Nabble.com.


More information about the rabbitmq-discuss mailing list