We are running rabbitmq 2.4.1 in production and recently had a failure that we can not determine the root cause. Also we tried a restart of the broker and the restart hung, never returned. We rebooted the machine to restore the broker.<div><br></div><div>We have only the rabbitmq and sasl logs at this point, but the error messages don't mean much to us.</div><div><br></div><div>rabbitmq log snippet:</div><div><br></div><div><div>=INFO REPORT==== 11-Apr-2012::05:04:08 ===</div><div>starting TCP connection <0.28490.65> from 172.17.208.67:1522</div><div><br></div><div>=INFO REPORT==== 11-Apr-2012::05:04:08 ===</div><div>closing TCP connection <0.9195.65> from 10.70.20.75:62045</div><div><br></div><div>=INFO REPORT==== 11-Apr-2012::05:04:31 ===</div><div>closing TCP connection <0.10243.65> from 10.70.40.77:53173</div><div><br></div><div>=ERROR REPORT==== 11-Apr-2012::05:04:31 ===</div><div>** Generic server msg_store_transient terminating</div><div>** Last message in was {'$gen_cast',</div><div> {client_dying,</div><div> <<74,18,61,37,8,55,8,91,210,27,70,185,112,89,</div><div> 171,154>>}}</div><div>** When Server state == {msstate,</div><div> "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1/msg_store_transient",</div><div> rabbit_msg_store_ets_index,</div><div> {state,417861,</div><div> "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1/msg_store_transient"},</div><div> 0,#Ref<0.0.0.875>,</div><div> {dict,0,16,16,8,80,48,</div><div> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},</div><div> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},</div><div> [],undefined,0,12073198,[],<0.233.0>,421958,413764,</div><div> 426055,</div><div> {set,0,16,16,8,80,48,</div><div> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},</div><div> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},</div></div><div>...skipping...</div><div><div> {dict,0,16,16,8,80,48,</div><div> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},</div><div> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],</div><div> []}}}}</div><div>** Reason for termination == </div><div>** {{badmatch,false},</div><div> [{rabbit_msg_store_ets_index,insert,2},</div><div> {rabbit_msg_store,write_message,3},</div><div> {rabbit_msg_store,handle_cast,2},</div><div> {gen_server2,handle_msg,2},</div><div> {proc_lib,wake_up,3}]}</div></div><div>...skipping...</div><div><div>=INFO REPORT==== 11-Apr-2012::05:04:43 ===</div><div>closing TCP connection <0.5032.4496> from 172.16.216.217:60234</div><div><br></div><div>=INFO REPORT==== 11-Apr-2012::05:04:43 ===</div><div>closing TCP connection <0.8419.6115> from 10.65.10.72:54580</div><div><br></div><div>=ERROR REPORT==== 11-Apr-2012::05:04:43 ===</div><div>** Generic server <0.31907.9> terminating</div><div>** Last message in was {'EXIT',<0.241.0>,shutdown}</div><div>** When Server state == {q,</div><div> {amqqueue,</div><div> {resource,<<"/alarming">>,queue,<<"alarming.9">>},</div><div> false,false,none,[],<0.31907.9>},</div><div> none,true,rabbit_variable_queue,</div><div> {vqstate,</div><div> {[],[]},</div><div> {0,{[],[]}},</div><div> {delta,undefined,0,undefined},</div></div><div>...skipping...</div><div><div> {state,fine,undefined},</div><div> {dict,0,16,16,8,80,48,</div><div> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},</div><div> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},</div><div> undefined,undefined}</div><div>** Reason for termination == </div><div>** {noproc,</div><div> {gen_server2,call,</div><div> [msg_store_transient,</div><div> {client_terminate,</div><div> <<17,102,9,148,6,184,165,141,162,246,194,57,36,62,208,135>>},</div><div> infinity]}}</div><div>** In 'terminate' callback with reason ==</div><div>** shutdown</div><div><br></div><div>=ERROR REPORT==== 11-Apr-2012::05:04:43 ===</div><div>** gen_event handler rabbit_error_logger crashed.</div><div>** Was installed in error_logger</div><div>** Last event was: {error,<0.146.0>,</div><div> {<0.9700.6>,</div><div> "** Generic server ~p terminating~n** Last message in was ~p~n** When Server state == ~p~n** Reason for termination == ~n** ~p~n** In 'terminate' callback with reason ==~n** ~p~n",</div><div> [<0.9700.6>,</div><div> {'EXIT',<0.241.0>,shutdown},</div><div> {q,</div><div> {amqqueue,</div><div> {resource,<<"/rssm">>,queue,</div><div> <<"cse.rssm.logManager.sqlserver">>},</div><div> false,false,none,[],<0.9700.6>},</div><div> none,true,rabbit_variable_queue,</div><div> {vqstate,</div><div> {[],[]},</div><div> {0,{[],[]}},</div><div> {delta,undefined,0,undefined},</div><div> {0,{[],[]}},</div></div><div>...skipping...</div><div><div> {noproc,</div><div> {gen_server2,call,</div><div> [msg_store_transient,</div><div> {client_terminate,</div><div> <<143,174,238,76,144,209,125,211,110,123,56,1,237,</div><div> 217,136,2>>},</div><div> infinity]}},</div><div> shutdown]}}</div><div>** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}</div><div>** Reason == {badarg,[{ets,lookup,[rabbit_registry,{exchange,topic}]},</div><div> {rabbit_registry,lookup_module,2},</div><div> {rabbit_exchange,type_to_module,1},</div><div> {rabbit_exchange,route,2},</div><div> {rabbit_exchange,publish,2},</div><div> {rabbit_basic,publish,1},</div><div> {rabbit_error_logger,publish1,4},</div><div> {rabbit_error_logger,handle_event,2}]}</div><div><br></div><div>=INFO REPORT==== 11-Apr-2012::05:04:43 ===</div><div> application: rabbit</div><div> exited: shutdown</div><div> type: permanent</div></div><div><br></div><div><br></div><div>sasl log snippet:</div><div><div>=SUPERVISOR REPORT==== 11-Apr-2012::00:15:30 ===</div><div> Supervisor: {<0.5419.34>,rabbit_channel_sup_sup}</div><div> Context: shutdown_error</div><div> Reason: shutdown</div><div> Offender: [{pid,<0.5731.34>},</div><div> {name,channel_sup},</div><div> {mfa,{rabbit_channel_sup,start_link,[]}},</div><div> {restart_type,temporary},</div><div> {shutdown,infinity},</div><div> {child_type,supervisor}]</div><div><br></div><div><br></div><div>=CRASH REPORT==== 11-Apr-2012::05:04:32 ===</div><div> crasher:</div><div> initial call: gen:init_it/7</div><div> pid: <0.232.0></div><div> registered_name: msg_store_transient</div><div> exception exit: {{badmatch,false},</div><div> [{rabbit_msg_store_ets_index,insert,2},</div><div> {rabbit_msg_store,write_message,3},</div><div> {rabbit_msg_store,handle_cast,2},</div><div> {gen_server2,handle_msg,2},</div><div> {proc_lib,wake_up,3}]}</div><div> in function gen_server2:terminate/3</div><div> ancestors: [rabbit_sup,<0.147.0>]</div><div> messages: [{'EXIT',<0.233.0>,normal}]</div><div> links: [<0.148.0>]</div><div> dictionary: [{fhc_age_tree,{0,nil}}]</div><div> trap_exit: true</div><div> status: running</div><div> heap_size: 10946</div><div> stack_size: 24</div><div> reductions: 98380626</div><div> neighbours:</div></div><div><div>=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===</div><div> Supervisor: {local,rabbit_sup}</div><div> Context: child_terminated</div><div> Reason: {{badmatch,false},</div><div> [{rabbit_msg_store_ets_index,insert,2},</div><div> {rabbit_msg_store,write_message,3},</div><div> {rabbit_msg_store,handle_cast,2},</div><div> {gen_server2,handle_msg,2},</div><div> {proc_lib,wake_up,3}]}</div><div> Offender: [{pid,<0.232.0>},</div><div> {name,msg_store_transient},</div><div> {mfargs,</div><div> {rabbit_msg_store,start_link,</div><div> [msg_store_transient,</div><div> "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1",</div><div> undefined,</div><div> {#Fun<rabbit_variable_queue.0.66952436>,ok}]}},</div><div> {restart_type,transient},</div><div> {shutdown,4294967295},</div><div> {child_type,worker}]</div><div><br></div><div><br></div><div>=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===</div><div> Supervisor: {local,rabbit_sup}</div><div> Context: shutdown</div><div> Reason: reached_max_restart_intensity</div><div> Offender: [{pid,<0.232.0>},</div><div> {name,msg_store_transient},</div><div> {mfargs,</div><div> {rabbit_msg_store,start_link,</div><div> [msg_store_transient,</div><div> "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1",</div><div> undefined,</div><div> {#Fun<rabbit_variable_queue.0.66952436>,ok}]}},</div><div> {restart_type,transient},</div><div> {shutdown,4294967295},</div><div> {child_type,worker}]</div></div><div>...skipping...</div><div><div>=CRASH REPORT==== 11-Apr-2012::05:04:43 ===</div><div> crasher:</div><div> initial call: gen:init_it/6</div><div> pid: <0.31907.9></div><div> registered_name: []</div><div> exception exit: {noproc,</div><div> {gen_server2,call,</div><div> [msg_store_transient,</div><div> {client_terminate,</div><div> <<213,104,174,241,176,121,164,159,98,43,221,</div><div> 160,120,109,6,107>>},</div><div> infinity]}}</div><div> in function gen_server2:terminate/3</div><div> ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.147.0>]</div><div> messages: []</div><div> links: []</div><div> dictionary: [{guid,{{9,<0.31907.9>},0}}]</div><div> trap_exit: true</div><div> status: running</div><div> heap_size: 987</div><div> stack_size: 24</div><div> reductions: 443158598</div><div> neighbours:</div><div><br></div><div>=SUPERVISOR REPORT==== 11-Apr-2012::05:04:43 ===</div><div> Supervisor: {local,rabbit_amqqueue_sup}</div><div> Context: shutdown_error</div><div> Reason: {noproc,</div><div> {gen_server2,call,</div><div> [msg_store_transient,</div><div> {client_terminate,</div><div> <<213,104,174,241,176,121,164,159,98,43,221,160,</div><div> 120,109,6,107>>},</div><div> infinity]}}</div><div> Offender: [{pid,<0.31907.9>},</div><div> {name,rabbit_amqqueue},</div><div> {mfa,{rabbit_amqqueue_process,start_link,[]}},</div><div> {restart_type,temporary},</div><div> {shutdown,4294967295},</div><div> {child_type,worker}]</div><div><br></div></div><div>Any help determining the cause would be appreciated.</div><div><br></div><div>Mark.</div><div><br></div>