[rabbitmq-discuss] Durable Queues Crash

Jonathan Cua jcua at eventbrite.com
Wed Nov 6 01:14:58 GMT 2013


Just to clarify, these crashes happens over and over, but we can't figure out what causes it or when the next crash will happen, all we are doing is running Celery as the consumer like usual. In the past we are using Rabbitmq 2.8.7 (Ubuntu 10.04) and we do not see this issue at all.

Actually looking at the sasl.log again, I see this prior to the first crash.

=CRASH REPORT==== 5-Nov-2013::12:31:47 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.13824.12>
    registered_name: []
    exception exit: {{badmatch,{error,not_found}},
                     [{rabbit_mirror_queue_master,stop_all_slaves,2},
                      {rabbit_mirror_queue_master,delete_and_terminate,2},
                      {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
                      {rabbit_amqqueue_process,terminate_shutdown,2},
                      {gen_server2,terminate,3},
                      {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.4763.12>]
    messages: []
    links: [<0.4951.12>,<0.13825.12>,#Port<0.149843>]
    dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},
                  {delegate,delegate_1},
                  {{ch,<22460.3466.13>},
                   {cr,<22460.3466.13>,#Ref<0.0.8.129430>,
                       {[13247,13244],[13229,13232]},
                       1,
...

trap_exit: true
    status: running
    heap_size: 317811
    stack_size: 24
    reductions: 12808667
  neighbours:
    neighbour: [{pid,<0.13826.12>},
                  {registered_name,[]},
                  {initial_call,{gen,init_it,
                                     ['Argument__1','Argument__2',
                                      'Argument__3','Argument__4',
                                      'Argument__5','Argument__6']}},
                  {current_function,{gen_server2,process_next_msg,1}},
                  {ancestors,[<0.13825.12>,<0.13824.12>,rabbit_amqqueue_sup,
                              rabbit_sup,<0.4763.12>]},
                  {messages,[]},
                  {links,[<0.13825.12>]},
                  {dictionary,[{random_seed,{1383,15776,4002}}]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,610},
                  {stack_size,7},
                  {reductions,10939697}]
    neighbour: [{pid,<0.13825.12>},
                  {registered_name,[]},
                  {initial_call,{gen,init_it,
                                     ['Argument__1','Argument__2',
                                      'Argument__3','Argument__4',
                                      'Argument__5','Argument__6']}},
                  {current_function,{gen_server2,process_next_msg,1}},
                  {ancestors,[<0.13824.12>,rabbit_amqqueue_sup,rabbit_sup,
                              <0.4763.12>]},
                  {messages,[]},
                  {links,[<0.13824.12>,<0.13826.12>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,610},
                  {stack_size,7},
                  {reductions,1868}]

=SUPERVISOR REPORT==== 5-Nov-2013::12:31:47 ===
     Supervisor: {local,rabbit_amqqueue_sup}
     Context:    child_terminated
     Reason:     {{badmatch,{error,not_found}},
                  [{rabbit_mirror_queue_master,stop_all_slaves,2},
                   {rabbit_mirror_queue_master,delete_and_terminate,2},
                   {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
                   {rabbit_amqqueue_process,terminate_shutdown,2},
                   {gen_server2,terminate,3},
                   {proc_lib,wake_up,3}]}
     Offender:   [{pid,<0.13824.12>},
                  {name,rabbit_amqqueue},
                  {mfargs,{rabbit_amqqueue_process,start_link,undefined}},
                  {restart_type,temporary},
                  {shutdown,4294967295},
                  {child_type,worker}]


And sometimes we would get this variation of another 'crash'...

=CRASH REPORT==== 5-Nov-2013::14:47:59 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.2803.0>
    registered_name: []
    exception exit: {function_clause,
                        [{rabbit_mirror_queue_slave,forget_sender,
                             [down_from_gm,down_from_gm]},
                         {rabbit_mirror_queue_slave,maybe_forget_sender,3},
                         {rabbit_mirror_queue_slave,process_instruction,2},
                         {rabbit_mirror_queue_slave,handle_cast,2},
                         {gen_server2,handle_msg,2},
                         {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.99.0>]
    messages: [{'EXIT',<0.2804.0>,normal},
                  {'DOWN',
                      {delegate_9,<22460.25536.1>},
                      process,<22460.25536.1>,noproc},
                  {'$gen_cast',
                      {run_backing_queue,rabbit_variable_queue,
                          #Fun<rabbit_variable_queue.26.70600163>}},
                  {'$gen_cast',
                      {run_backing_queue,rabbit_variable_queue,
                          #Fun<rabbit_variable_queue.27.5764429>}}]
    links: [<0.297.0>]
    dictionary: [{credit_blocked,[]},
                  {{xtype_to_module,direct},rabbit_exchange_type_direct},
                  {delegate,delegate_9},
                  {fhc_age_tree,{0,nil}},
                  {{credit_from,<0.289.0>},1902},
                  {guid,{{1685981799,1034401634,624789925,1597815233},1}}]
    trap_exit: true
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 121868
  neighbours:

=SUPERVISOR REPORT==== 5-Nov-2013::14:47:59 ===
     Supervisor: {local,
                                           rabbit_mirror_queue_slave_sup}
     Context:    child_terminated
     Reason:     {function_clause,
                     [{rabbit_mirror_queue_slave,forget_sender,
                          [down_from_gm,down_from_gm]},
                      {rabbit_mirror_queue_slave,maybe_forget_sender,3},
                      {rabbit_mirror_queue_slave,process_instruction,2},
                      {rabbit_mirror_queue_slave,handle_cast,2},
                      {gen_server2,handle_msg,2},
                      {proc_lib,wake_up,3}]}
     Offender:   [{pid,<0.2803.0>},
                  {name,rabbit_mirror_queue_slave},
                  {mfargs,{rabbit_mirror_queue_slave,start_link,undefined}},
                  {restart_type,temporary},
                  {shutdown,4294967295},
                  {child_type,worker}]


Thanks,
Jonathan


On Nov 5, 2013, at 3:47 PM, Jonathan Cua wrote:

> No, the nodes did not get restarted.
> 
> There is nothing in the sasl logs prior to the ones that I posted.
> 
> Jonathan
> 
> On Nov 5, 2013, at 3:33 PM, Matthias Radestock wrote:
> 
>> Jonathan,
>> 
>> On 05/11/13 21:24, Jonathan Cua wrote:
>>> From time to time (around a couple of hours after the cluster has been
>>> set up), I would get this error in the sasl.log. [...]
>> 
>> Did either of the two nodes get restarted during those two hours?
>> 
>> Also, do you see any errors in the sasl logs prior to the ones you posted?
>> 
>> Matthias.
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131105/6d52297d/attachment.htm>


More information about the rabbitmq-discuss mailing list