[rabbitmq-discuss] Durable Queues Crash
Jonathan Cua
jcua at eventbrite.com
Wed Nov 6 01:14:58 GMT 2013
Just to clarify, these crashes happens over and over, but we can't figure out what causes it or when the next crash will happen, all we are doing is running Celery as the consumer like usual. In the past we are using Rabbitmq 2.8.7 (Ubuntu 10.04) and we do not see this issue at all.
Actually looking at the sasl.log again, I see this prior to the first crash.
=CRASH REPORT==== 5-Nov-2013::12:31:47 ===
crasher:
initial call: gen:init_it/6
pid: <0.13824.12>
registered_name: []
exception exit: {{badmatch,{error,not_found}},
[{rabbit_mirror_queue_master,stop_all_slaves,2},
{rabbit_mirror_queue_master,delete_and_terminate,2},
{rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
{rabbit_amqqueue_process,terminate_shutdown,2},
{gen_server2,terminate,3},
{proc_lib,wake_up,3}]}
in function gen_server2:terminate/3
ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.4763.12>]
messages: []
links: [<0.4951.12>,<0.13825.12>,#Port<0.149843>]
dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},
{delegate,delegate_1},
{{ch,<22460.3466.13>},
{cr,<22460.3466.13>,#Ref<0.0.8.129430>,
{[13247,13244],[13229,13232]},
1,
...
trap_exit: true
status: running
heap_size: 317811
stack_size: 24
reductions: 12808667
neighbours:
neighbour: [{pid,<0.13826.12>},
{registered_name,[]},
{initial_call,{gen,init_it,
['Argument__1','Argument__2',
'Argument__3','Argument__4',
'Argument__5','Argument__6']}},
{current_function,{gen_server2,process_next_msg,1}},
{ancestors,[<0.13825.12>,<0.13824.12>,rabbit_amqqueue_sup,
rabbit_sup,<0.4763.12>]},
{messages,[]},
{links,[<0.13825.12>]},
{dictionary,[{random_seed,{1383,15776,4002}}]},
{trap_exit,false},
{status,waiting},
{heap_size,610},
{stack_size,7},
{reductions,10939697}]
neighbour: [{pid,<0.13825.12>},
{registered_name,[]},
{initial_call,{gen,init_it,
['Argument__1','Argument__2',
'Argument__3','Argument__4',
'Argument__5','Argument__6']}},
{current_function,{gen_server2,process_next_msg,1}},
{ancestors,[<0.13824.12>,rabbit_amqqueue_sup,rabbit_sup,
<0.4763.12>]},
{messages,[]},
{links,[<0.13824.12>,<0.13826.12>]},
{dictionary,[]},
{trap_exit,false},
{status,waiting},
{heap_size,610},
{stack_size,7},
{reductions,1868}]
=SUPERVISOR REPORT==== 5-Nov-2013::12:31:47 ===
Supervisor: {local,rabbit_amqqueue_sup}
Context: child_terminated
Reason: {{badmatch,{error,not_found}},
[{rabbit_mirror_queue_master,stop_all_slaves,2},
{rabbit_mirror_queue_master,delete_and_terminate,2},
{rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
{rabbit_amqqueue_process,terminate_shutdown,2},
{gen_server2,terminate,3},
{proc_lib,wake_up,3}]}
Offender: [{pid,<0.13824.12>},
{name,rabbit_amqqueue},
{mfargs,{rabbit_amqqueue_process,start_link,undefined}},
{restart_type,temporary},
{shutdown,4294967295},
{child_type,worker}]
And sometimes we would get this variation of another 'crash'...
=CRASH REPORT==== 5-Nov-2013::14:47:59 ===
crasher:
initial call: gen:init_it/6
pid: <0.2803.0>
registered_name: []
exception exit: {function_clause,
[{rabbit_mirror_queue_slave,forget_sender,
[down_from_gm,down_from_gm]},
{rabbit_mirror_queue_slave,maybe_forget_sender,3},
{rabbit_mirror_queue_slave,process_instruction,2},
{rabbit_mirror_queue_slave,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}
in function gen_server2:terminate/3
ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.99.0>]
messages: [{'EXIT',<0.2804.0>,normal},
{'DOWN',
{delegate_9,<22460.25536.1>},
process,<22460.25536.1>,noproc},
{'$gen_cast',
{run_backing_queue,rabbit_variable_queue,
#Fun<rabbit_variable_queue.26.70600163>}},
{'$gen_cast',
{run_backing_queue,rabbit_variable_queue,
#Fun<rabbit_variable_queue.27.5764429>}}]
links: [<0.297.0>]
dictionary: [{credit_blocked,[]},
{{xtype_to_module,direct},rabbit_exchange_type_direct},
{delegate,delegate_9},
{fhc_age_tree,{0,nil}},
{{credit_from,<0.289.0>},1902},
{guid,{{1685981799,1034401634,624789925,1597815233},1}}]
trap_exit: true
status: running
heap_size: 6765
stack_size: 24
reductions: 121868
neighbours:
=SUPERVISOR REPORT==== 5-Nov-2013::14:47:59 ===
Supervisor: {local,
rabbit_mirror_queue_slave_sup}
Context: child_terminated
Reason: {function_clause,
[{rabbit_mirror_queue_slave,forget_sender,
[down_from_gm,down_from_gm]},
{rabbit_mirror_queue_slave,maybe_forget_sender,3},
{rabbit_mirror_queue_slave,process_instruction,2},
{rabbit_mirror_queue_slave,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}
Offender: [{pid,<0.2803.0>},
{name,rabbit_mirror_queue_slave},
{mfargs,{rabbit_mirror_queue_slave,start_link,undefined}},
{restart_type,temporary},
{shutdown,4294967295},
{child_type,worker}]
Thanks,
Jonathan
On Nov 5, 2013, at 3:47 PM, Jonathan Cua wrote:
> No, the nodes did not get restarted.
>
> There is nothing in the sasl logs prior to the ones that I posted.
>
> Jonathan
>
> On Nov 5, 2013, at 3:33 PM, Matthias Radestock wrote:
>
>> Jonathan,
>>
>> On 05/11/13 21:24, Jonathan Cua wrote:
>>> From time to time (around a couple of hours after the cluster has been
>>> set up), I would get this error in the sasl.log. [...]
>>
>> Did either of the two nodes get restarted during those two hours?
>>
>> Also, do you see any errors in the sasl logs prior to the ones you posted?
>>
>> Matthias.
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131105/6d52297d/attachment.htm>
More information about the rabbitmq-discuss
mailing list