<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Just to clarify, these crashes <span class="Apple-style-span" style="color: rgb(20, 20, 20); font-family: 'Lucida Grande'; "><span class="Apple-style-span" style="line-height: 18px; ">happens over and over, but we can't figure out what causes it or when the next crash will happen, all we are doing is running Celery as the consumer like usual. In the past we are using Rabbitmq 2.8.7 (Ubuntu 10.04) and we do not see this issue at all.</span></span><div><br></div><div>Actually looking at the sasl.log again, I see this prior to the first crash.<div><br></div><div><div>=CRASH REPORT==== 5-Nov-2013::12:31:47 ===</div><div> crasher:</div><div> initial call: gen:init_it/6</div><div> pid: <0.13824.12></div><div> registered_name: []</div><div> exception exit: {{badmatch,{error,not_found}},</div><div> [{rabbit_mirror_queue_master,stop_all_slaves,2},</div><div> {rabbit_mirror_queue_master,delete_and_terminate,2},</div><div> {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},</div><div> {rabbit_amqqueue_process,terminate_shutdown,2},</div><div> {gen_server2,terminate,3},</div><div> {proc_lib,wake_up,3}]}</div><div> in function gen_server2:terminate/3</div><div> ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.4763.12>]</div><div> messages: []</div><div> links: [<0.4951.12>,<0.13825.12>,#Port<0.149843>]</div><div> dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},</div><div> {delegate,delegate_1},</div><div> {{ch,<22460.3466.13>},</div><div> {cr,<22460.3466.13>,#Ref<0.0.8.129430>,</div><div> {[13247,13244],[13229,13232]},</div><div> 1,</div><div>...</div><div><br></div><div><div>trap_exit: true</div><div> status: running</div><div> heap_size: 317811</div><div> stack_size: 24</div><div> reductions: 12808667</div><div> neighbours:</div><div> neighbour: [{pid,<0.13826.12>},</div><div> {registered_name,[]},</div><div> {initial_call,{gen,init_it,</div><div> ['Argument__1','Argument__2',</div><div> 'Argument__3','Argument__4',</div><div> 'Argument__5','Argument__6']}},</div><div> {current_function,{gen_server2,process_next_msg,1}},</div><div> {ancestors,[<0.13825.12>,<0.13824.12>,rabbit_amqqueue_sup,</div><div> rabbit_sup,<0.4763.12>]},</div><div> {messages,[]},</div><div> {links,[<0.13825.12>]},</div><div> {dictionary,[{random_seed,{1383,15776,4002}}]},</div><div> {trap_exit,false},</div><div> {status,waiting},</div><div> {heap_size,610},</div><div> {stack_size,7},</div><div> {reductions,10939697}]</div><div> neighbour: [{pid,<0.13825.12>},</div><div> {registered_name,[]},</div><div> {initial_call,{gen,init_it,</div><div> ['Argument__1','Argument__2',</div><div> 'Argument__3','Argument__4',</div><div> 'Argument__5','Argument__6']}},</div><div> {current_function,{gen_server2,process_next_msg,1}},</div><div> {ancestors,[<0.13824.12>,rabbit_amqqueue_sup,rabbit_sup,</div><div> <0.4763.12>]},</div><div> {messages,[]},</div><div> {links,[<0.13824.12>,<0.13826.12>]},</div><div> {dictionary,[]},</div><div> {trap_exit,false},</div><div> {status,waiting},</div><div> {heap_size,610},</div><div> {stack_size,7},</div><div> {reductions,1868}]</div><div><br></div><div>=SUPERVISOR REPORT==== 5-Nov-2013::12:31:47 ===</div><div> Supervisor: {local,rabbit_amqqueue_sup}</div><div> Context: child_terminated</div><div> Reason: {{badmatch,{error,not_found}},</div><div> [{rabbit_mirror_queue_master,stop_all_slaves,2},</div><div> {rabbit_mirror_queue_master,delete_and_terminate,2},</div><div> {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},</div><div> {rabbit_amqqueue_process,terminate_shutdown,2},</div><div> {gen_server2,terminate,3},</div><div> {proc_lib,wake_up,3}]}</div><div> Offender: [{pid,<0.13824.12>},</div><div> {name,rabbit_amqqueue},</div><div> {mfargs,{rabbit_amqqueue_process,start_link,undefined}},</div><div> {restart_type,temporary},</div><div> {shutdown,4294967295},</div><div> {child_type,worker}]</div></div><div><br></div><div><br></div><div>And sometimes we would get this variation of another 'crash'...</div><div><br></div><div><div>=CRASH REPORT==== 5-Nov-2013::14:47:59 ===</div><div> crasher:</div><div> initial call: gen:init_it/6</div><div> pid: <0.2803.0></div><div> registered_name: []</div><div> exception exit: {function_clause,</div><div> [{rabbit_mirror_queue_slave,forget_sender,</div><div> [down_from_gm,down_from_gm]},</div><div> {rabbit_mirror_queue_slave,maybe_forget_sender,3},</div><div> {rabbit_mirror_queue_slave,process_instruction,2},</div><div> {rabbit_mirror_queue_slave,handle_cast,2},</div><div> {gen_server2,handle_msg,2},</div><div> {proc_lib,wake_up,3}]}</div><div> in function gen_server2:terminate/3</div><div> ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.99.0>]</div><div> messages: [{'EXIT',<0.2804.0>,normal},</div><div> {'DOWN',</div><div> {delegate_9,<22460.25536.1>},</div><div> process,<22460.25536.1>,noproc},</div><div> {'$gen_cast',</div><div> {run_backing_queue,rabbit_variable_queue,</div><div> #Fun<rabbit_variable_queue.26.70600163>}},</div><div> {'$gen_cast',</div><div> {run_backing_queue,rabbit_variable_queue,</div><div> #Fun<rabbit_variable_queue.27.5764429>}}]</div><div> links: [<0.297.0>]</div><div> dictionary: [{credit_blocked,[]},</div><div> {{xtype_to_module,direct},rabbit_exchange_type_direct},</div><div> {delegate,delegate_9},</div><div> {fhc_age_tree,{0,nil}},</div><div> {{credit_from,<0.289.0>},1902},</div><div> {guid,{{1685981799,1034401634,624789925,1597815233},1}}]</div><div> trap_exit: true</div><div> status: running</div><div> heap_size: 6765</div><div> stack_size: 24</div><div> reductions: 121868</div><div> neighbours:</div><div><br></div><div>=SUPERVISOR REPORT==== 5-Nov-2013::14:47:59 ===</div><div> Supervisor: {local,</div><div> rabbit_mirror_queue_slave_sup}</div><div> Context: child_terminated</div><div> Reason: {function_clause,</div><div> [{rabbit_mirror_queue_slave,forget_sender,</div><div> [down_from_gm,down_from_gm]},</div><div> {rabbit_mirror_queue_slave,maybe_forget_sender,3},</div><div> {rabbit_mirror_queue_slave,process_instruction,2},</div><div> {rabbit_mirror_queue_slave,handle_cast,2},</div><div> {gen_server2,handle_msg,2},</div><div> {proc_lib,wake_up,3}]}</div><div> Offender: [{pid,<0.2803.0>},</div><div> {name,rabbit_mirror_queue_slave},</div><div> {mfargs,{rabbit_mirror_queue_slave,start_link,undefined}},</div><div> {restart_type,temporary},</div><div> {shutdown,4294967295},</div><div> {child_type,worker}]</div></div><div><br></div><div><br></div><div>Thanks,</div><div>Jonathan</div><div><br></div><div><br><div><div>On Nov 5, 2013, at 3:47 PM, Jonathan Cua wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>No, the nodes did not get restarted.<br><br>There is nothing in the sasl logs prior to the ones that I posted.<br><br>Jonathan<br><br>On Nov 5, 2013, at 3:33 PM, Matthias Radestock wrote:<br><br><blockquote type="cite">Jonathan,<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">On 05/11/13 21:24, Jonathan Cua wrote:<br></blockquote><blockquote type="cite"><blockquote type="cite">From time to time (around a couple of hours after the cluster has been<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">set up), I would get this error in the sasl.log. [...]<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Did either of the two nodes get restarted during those two hours?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Also, do you see any errors in the sasl logs prior to the ones you posted?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Matthias.<br></blockquote><blockquote type="cite">_______________________________________________<br></blockquote><blockquote type="cite">rabbitmq-discuss mailing list<br></blockquote><blockquote type="cite"><a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br></blockquote><blockquote type="cite"><a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br></blockquote><br></div></blockquote></div><br></div></div></div></body></html>