[rabbitmq-discuss] Rabbitmq crashing every 90 minutes
Dan_b
daniel.bason at telogis.com
Wed Jan 22 21:13:31 GMT 2014
Hi,
For the last 6 hours one of our rabbitmq instances has been crashing every
90 minutes with the ram usage spiking up after the crash. Restarting rabbit
fixes the issue but then it is another 90 minutes and it crashes again. I
have tried resetting the rabbit instance completely (renaming the
/var/lib/rabbitmq/mnesia database and letting it be recreated) and am
waiting to see if this has resolved the issue. It would be nice if we would
work out what caused this though.
Rabbitmq status is as follows:
{running_applications,[{rabbit,"RabbitMQ","3.1.5"},
{ssl,"Erlang/OTP SSL application","4.1.6"},
{public_key,"Public key infrastructure","0.13"},
{crypto,"CRYPTO version 2","2.0.4"},
{asn1,"The Erlang ASN1 compiler version 1.6.18",
"1.6.18"},
{mnesia,"MNESIA CXC 138 12","4.5"},
{os_mon,"CPO CXC 138 46","2.2.7"},
{xmerl,"XML parser","1.2.10"},
{sasl,"SASL CXC 138 11","2.1.10"},
{stdlib,"ERTS CXC 138 10","1.17.5"},
{kernel,"ERTS CXC 138 10","2.14.5"}]},
{os,{unix,linux}},
{erlang_version,"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:4:4]
[rq:4] [async-threads:30] [kernel-poll:true]\n"},
{memory,[{total,57151960},
{connection_procs,4273224},
{queue_procs,5370024},
{plugins,0},
{other_proc,9627768},
{mnesia,182848},
{mgmt_db,0},
{msg_index,836424},
{other_ets,1219792},
{binary,17570544},
{code,14611134},
{atom,1362545},
{other_system,2097657}]},
{vm_memory_high_watermark,0.8},
{vm_memory_limit,6871947673},
{disk_free_limit,1000000000},
{disk_free,10916909056},
{file_descriptors,[{total_limit,924},
{total_used,66},
{sockets_limit,829},
{sockets_used,46}]},
{processes,[{limit,1048576},{used,654}]},
{run_queue,0},
{uptime,2820}]
...done.
Output from the sasl log is:
=CRASH REPORT==== 22-Jan-2014::19:37:28 ===
crasher:
initial call: tcp_acceptor:init/1
pid: <0.232.0>
registered_name: []
exception exit: {accept_failed,enfile}
in function gen_server:terminate/6
ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
messages: []
links: [<0.230.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 24
reductions: 1442919
neighbours:
=SUPERVISOR REPORT==== 22-Jan-2014::19:37:28 ===
Supervisor: {local,'tcp_acceptor_sup_:::5672'}
Context: child_terminated
Reason: {accept_failed,enfile}
Offender: [{pid,<0.232.0>},
{name,tcp_acceptor},
{mfargs,
{tcp_acceptor,start_link,
[{rabbit_networking,start_client,[]},
#Port<0.4998>]}},
{restart_type,transient},
{shutdown,brutal_kill},
{child_type,worker}]
=CRASH REPORT==== 22-Jan-2014::19:37:30 ===
crasher:
initial call: tcp_acceptor:init/1
pid: <0.10514.3>
registered_name: []
exception exit: {accept_failed,enfile}
in function gen_server:terminate/6
ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
messages: []
links: [<0.230.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 233
stack_size: 24
reductions: 148
neighbours:
=SUPERVISOR REPORT==== 22-Jan-2014::19:37:30 ===
Supervisor: {local,'tcp_acceptor_sup_:::5672'}
Context: child_terminated
Reason: {accept_failed,enfile}
Offender: [{pid,<0.10514.3>},
{name,tcp_acceptor},
{mfargs,
{tcp_acceptor,start_link,
[{rabbit_networking,start_client,[]},
#Port<0.4998>]}},
{restart_type,transient},
{shutdown,brutal_kill},
{child_type,worker}]
=CRASH REPORT==== 22-Jan-2014::19:37:31 ===
crasher:
initial call: tcp_acceptor:init/1
pid: <0.10522.3>
registered_name: []
exception exit: {accept_failed,enfile}
in function gen_server:terminate/6
ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
messages: []
links: [<0.230.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 233
stack_size: 24
reductions: 148
neighbours:
=SUPERVISOR REPORT==== 22-Jan-2014::19:37:31 ===
Supervisor: {local,'tcp_acceptor_sup_:::5672'}
Context: child_terminated
Reason: {accept_failed,enfile}
Offender: [{pid,<0.10522.3>},
{name,tcp_acceptor},
{mfargs,
{tcp_acceptor,start_link,
[{rabbit_networking,start_client,[]},
#Port<0.4998>]}},
{restart_type,transient},
{shutdown,brutal_kill},
{child_type,worker}]
=CRASH REPORT==== 22-Jan-2014::19:37:32 ===
crasher:
initial call: tcp_acceptor:init/1
pid: <0.10523.3>
registered_name: []
exception exit: {accept_failed,enfile}
in function gen_server:terminate/6
ancestors: ['tcp_acceptor_sup_:::5672',<0.229.0>,rabbit_sup,<0.98.0>]
messages: []
links: [<0.230.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 233
=CRASH REPORT==== 22-Jan-2014::19:37:37 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.156.0>
registered_name: rabbit_disk_monitor
exception exit: {system_limit,
[{erlang,open_port,
[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
{os,start_port_srv_handle,1},
{os,start_port_srv_loop,0}]}
in function gen_server:terminate/6
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.98.0>]
messages: []
links: [<0.79.0>,<0.155.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1597
stack_size: 24
reductions: 964775
neighbours:
=SUPERVISOR REPORT==== 22-Jan-2014::19:37:37 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: child_terminated
Reason: {system_limit,
[{erlang,open_port,
[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
{os,start_port_srv_handle,1},
{os,start_port_srv_loop,0}]}
Offender: [{pid,<0.156.0>},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[1000000000]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 22-Jan-2014::19:37:37 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.10529.3>
registered_name: []
exception exit: {{badmatch,{error,system_limit}},
[{vm_memory_monitor,read_proc_file,1},
{vm_memory_monitor,get_total_memory,1},
{rabbit_disk_monitor,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
in function gen_server:init_it/6
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.98.0>]
messages: []
links: [<0.155.0>,<0.79.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 24
reductions: 482
neighbours:
And so on.
I can compress and send the actual logs to someone if this will help. We
also have the erlang crashdump from the issue.
--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Rabbitmq-crashing-every-90-minutes-tp32798.html
Sent from the RabbitMQ mailing list archive at Nabble.com.
More information about the rabbitmq-discuss
mailing list