[rabbitmq-discuss] 2.4.1 broker failure/crash

Mark Geib mark.geib.44 at gmail.com
Wed Apr 18 21:12:22 BST 2012


We are running rabbitmq 2.4.1 in production and recently had a failure that 
we can not determine the root cause. Also we tried a restart of the broker 
and the restart hung, never returned. We rebooted the machine to restore 
the broker.

We have only the rabbitmq and sasl logs at this point, but the error 
messages don't mean much to us.

rabbitmq log snippet:

=INFO REPORT==== 11-Apr-2012::05:04:08 ===
starting TCP connection <0.28490.65> from 172.17.208.67:1522

=INFO REPORT==== 11-Apr-2012::05:04:08 ===
closing TCP connection <0.9195.65> from 10.70.20.75:62045

=INFO REPORT==== 11-Apr-2012::05:04:31 ===
closing TCP connection <0.10243.65> from 10.70.40.77:53173

=ERROR REPORT==== 11-Apr-2012::05:04:31 ===
** Generic server msg_store_transient terminating
** Last message in was {'$gen_cast',
                           {client_dying,
                               <<74,18,61,37,8,55,8,91,210,27,70,185,112,89,
                                 171,154>>}}
** When Server state == {msstate,
                        
 "/var/lib/rabbitmq/mnesia/rabbit at che-csebrokerp1/msg_store_transient",
                         rabbit_msg_store_ets_index,
                         {state,417861,
                          
"/var/lib/rabbitmq/mnesia/rabbit at che-csebrokerp1/msg_store_transient"},
                         0,#Ref<0.0.0.875>,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         [],undefined,0,12073198,[],<0.233.0>,421958,413764,
                         426055,
                         {set,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
...skipping...
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                            []}}}}
** Reason for termination == 
** {{badmatch,false},
    [{rabbit_msg_store_ets_index,insert,2},
     {rabbit_msg_store,write_message,3},
     {rabbit_msg_store,handle_cast,2},
     {gen_server2,handle_msg,2},
     {proc_lib,wake_up,3}]}
...skipping...
=INFO REPORT==== 11-Apr-2012::05:04:43 ===
closing TCP connection <0.5032.4496> from 172.16.216.217:60234

=INFO REPORT==== 11-Apr-2012::05:04:43 ===
closing TCP connection <0.8419.6115> from 10.65.10.72:54580

=ERROR REPORT==== 11-Apr-2012::05:04:43 ===
** Generic server <0.31907.9> terminating
** Last message in was {'EXIT',<0.241.0>,shutdown}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/alarming">>,queue,<<"alarming.9">>},
                          false,false,none,[],<0.31907.9>},
                         none,true,rabbit_variable_queue,
                         {vqstate,
                          {[],[]},
                          {0,{[],[]}},
                          {delta,undefined,0,undefined},
...skipping...
                         {state,fine,undefined},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         undefined,undefined}
** Reason for termination == 
** {noproc,
       {gen_server2,call,
           [msg_store_transient,
            {client_terminate,
                
<<17,102,9,148,6,184,165,141,162,246,194,57,36,62,208,135>>},
            infinity]}}
** In 'terminate' callback with reason ==
** shutdown

=ERROR REPORT==== 11-Apr-2012::05:04:43 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.146.0>,
                    {<0.9700.6>,
                     "** Generic server ~p terminating~n** Last message in 
was ~p~n** When Server state == ~p~n** Reason for termination == ~n** 
~p~n** In 'terminate' callback with reason ==~n** ~p~n",
                     [<0.9700.6>,
                      {'EXIT',<0.241.0>,shutdown},
                      {q,
                       {amqqueue,
                        {resource,<<"/rssm">>,queue,
                         <<"cse.rssm.logManager.sqlserver">>},
                        false,false,none,[],<0.9700.6>},
                       none,true,rabbit_variable_queue,
                       {vqstate,
                        {[],[]},
                        {0,{[],[]}},
                        {delta,undefined,0,undefined},
                        {0,{[],[]}},
...skipping...
                      {noproc,
                       {gen_server2,call,
                        [msg_store_transient,
                         {client_terminate,
                          <<143,174,238,76,144,209,125,211,110,123,56,1,237,
                            217,136,2>>},
                         infinity]}},
                      shutdown]}}
** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {badarg,[{ets,lookup,[rabbit_registry,{exchange,topic}]},
                      {rabbit_registry,lookup_module,2},
                      {rabbit_exchange,type_to_module,1},
                      {rabbit_exchange,route,2},
                      {rabbit_exchange,publish,2},
                      {rabbit_basic,publish,1},
                      {rabbit_error_logger,publish1,4},
                      {rabbit_error_logger,handle_event,2}]}

=INFO REPORT==== 11-Apr-2012::05:04:43 ===
    application: rabbit
    exited: shutdown
    type: permanent


sasl log snippet:
=SUPERVISOR REPORT==== 11-Apr-2012::00:15:30 ===
     Supervisor: {<0.5419.34>,rabbit_channel_sup_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{pid,<0.5731.34>},
                  {name,channel_sup},
                  {mfa,{rabbit_channel_sup,start_link,[]}},
                  {restart_type,temporary},
                  {shutdown,infinity},
                  {child_type,supervisor}]


=CRASH REPORT==== 11-Apr-2012::05:04:32 ===
  crasher:
    initial call: gen:init_it/7
    pid: <0.232.0>
    registered_name: msg_store_transient
    exception exit: {{badmatch,false},
                     [{rabbit_msg_store_ets_index,insert,2},
                      {rabbit_msg_store,write_message,3},
                      {rabbit_msg_store,handle_cast,2},
                      {gen_server2,handle_msg,2},
                      {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_sup,<0.147.0>]
    messages: [{'EXIT',<0.233.0>,normal}]
    links: [<0.148.0>]
    dictionary: [{fhc_age_tree,{0,nil}}]
    trap_exit: true
    status: running
    heap_size: 10946
    stack_size: 24
    reductions: 98380626
  neighbours:
=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===
     Supervisor: {local,rabbit_sup}
     Context:    child_terminated
     Reason:     {{badmatch,false},
                  [{rabbit_msg_store_ets_index,insert,2},
                   {rabbit_msg_store,write_message,3},
                   {rabbit_msg_store,handle_cast,2},
                   {gen_server2,handle_msg,2},
                   {proc_lib,wake_up,3}]}
     Offender:   [{pid,<0.232.0>},
                  {name,msg_store_transient},
                  {mfargs,
                      {rabbit_msg_store,start_link,
                          [msg_store_transient,
                          
 "/var/lib/rabbitmq/mnesia/rabbit at che-csebrokerp1",
                           undefined,
                           {#Fun<rabbit_variable_queue.0.66952436>,ok}]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]


=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===
     Supervisor: {local,rabbit_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.232.0>},
                  {name,msg_store_transient},
                  {mfargs,
                      {rabbit_msg_store,start_link,
                          [msg_store_transient,
                          
 "/var/lib/rabbitmq/mnesia/rabbit at che-csebrokerp1",
                           undefined,
                           {#Fun<rabbit_variable_queue.0.66952436>,ok}]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]
...skipping...
=CRASH REPORT==== 11-Apr-2012::05:04:43 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.31907.9>
    registered_name: []
    exception exit: {noproc,
                        {gen_server2,call,
                            [msg_store_transient,
                             {client_terminate,
                                
 <<213,104,174,241,176,121,164,159,98,43,221,
                                   160,120,109,6,107>>},
                             infinity]}}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.147.0>]
    messages: []
    links: []
    dictionary: [{guid,{{9,<0.31907.9>},0}}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 443158598
  neighbours:

=SUPERVISOR REPORT==== 11-Apr-2012::05:04:43 ===
     Supervisor: {local,rabbit_amqqueue_sup}
     Context:    shutdown_error
     Reason:     {noproc,
                     {gen_server2,call,
                         [msg_store_transient,
                          {client_terminate,
                              
<<213,104,174,241,176,121,164,159,98,43,221,160,
                                120,109,6,107>>},
                          infinity]}}
     Offender:   [{pid,<0.31907.9>},
                  {name,rabbit_amqqueue},
                  {mfa,{rabbit_amqqueue_process,start_link,[]}},
                  {restart_type,temporary},
                  {shutdown,4294967295},
                  {child_type,worker}]

Any help determining the cause would be appreciated.

Mark.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120418/c471ea9b/attachment.htm>


More information about the rabbitmq-discuss mailing list