[rabbitmq-discuss] Queue failure, potential loss of data.
Simon MacMullen
simon at rabbitmq.com
Fri Feb 14 16:32:47 GMT 2014
Were there more than two nodes in the cluster? That looks like another
case of a mirror being taken out by collateral damage from the original
slave.
Cheers, Simon
On 14/02/2014 4:24PM, Jason McIntosh wrote:
> BTW, here are the sasl logs from another node in the cluster:
>
> =CRASH REPORT==== 13-Feb-2014::05:14:36 ===
> crasher:
> initial call: gen:init_it/6
> pid: <0.987.0>
> registered_name: []
> exception exit: {{badmatch,{error,not_found}},
> [{rabbit_amqqueue_process,i,2,[]},
> {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
> {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
> {rabbit_amqqueue_process,emit_stats,2,[]},
> {rabbit_event,if_enabled,3,[]},
>
> {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',
> 6,[]},
> {rabbit_amqqueue_process,terminate_shutdown,2,[]},
> {gen_server2,terminate,3,[]}]}
> in function gen_server2:terminate/3
> ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.782.0>]
> messages: [{'$gen_cast',
> {run_backing_queue,rabbit_variable_queue,
> #Fun<rabbit_variable_queue.26.70600163>}},
> {'EXIT',<0.988.0>,normal}]
> links: [<0.954.0>]
> dictionary: [{{credit_from,<0.944.0>},1671},
> {{credit_to,<0.24877.6355>},2},
> {credit_blocked,[]},
> {delegate,delegate_0},
> {fhc_age_tree,{0,nil}},
> {guid,{{2283490857
> <tel:2283490857>,778293189,3964001052,3912480778},1}}]
> trap_exit: true
> status: running
> heap_size: 6772
> stack_size: 27
> reductions: 28827118159
> neighbours:
>
> =SUPERVISOR REPORT==== 13-Feb-2014::05:14:36 ===
> Supervisor: {local,
> rabbit_mirror_queue_slave_sup}
> Context: child_terminated
> Reason: {{badmatch,{error,not_found}},
> [{rabbit_amqqueue_process,i,2,[]},
> {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
> {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
> {rabbit_amqqueue_process,emit_stats,2,[]},
> {rabbit_event,if_enabled,3,[]},
>
> {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,[]},
> {rabbit_amqqueue_process,terminate_shutdown,2,[]},
> {gen_server2,terminate,3,[]}]}
> Offender: [{pid,<0.987.0>},
> {name,rabbit_mirror_queue_slave},
>
> {mfargs,{rabbit_mirror_queue_slave,start_link,undefined}},
> {restart_type,temporary},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
>
>
>
> On Fri, Feb 14, 2014 at 9:43 AM, Jason McIntosh <mcintoshj at gmail.com
> <mailto:mcintoshj at gmail.com>> wrote:
>
>
> RabbitMQ 3.2.0
> Erlang R16B02-1
>
> We have a queue that basically stopped doing anything intelligent.
> Here are the results. What's bad about this - it appears that
> messages continued to publish and didn't hit the dead letter
> exchange - they just disappeared. In this architecture, we've got a
> fanout exchange that publishes to two queues. One of the queues is
> working fine still. Our second queue though is what dropped off.
> Publishing though hasn't failed so I'm worried we've lost data for
> the last data. Any input would be welcome on this. Here's the
> second queues information from the management gui:
> cluster at rabbitmqm10p DLX DLK D Args Active ? ? ? 0.00/s
>
>
> When I try and select the queue, I just get an error message:
> TypeError: Cannot read property 'ram_msg_count' of undefined
>
> Any help/advice here? Is there some way I can change this queue so
> I do NOT lose messages and publishes fail?? I thought publisher
> confirms (need to verify they're on) would have taken care of this
> situation - that the message would have had to have been consumed or
> persisted to disk for all queues or publishing would have been rejected.
> Jason
>
>
>
> =CRASH REPORT==== 13-Feb-2014::05:14:36 ===
> crasher:
> initial call: gen:init_it/6
> pid: <0.367.0>
> registered_name: []
> exception exit: {{badmatch,{error,not_found}},
>
> [{rabbit_mirror_queue_master,stop_all_slaves,2,[]},
>
> {rabbit_mirror_queue_master,delete_and_terminate,2,[]},
>
> {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',
> 6,[]},
>
> {rabbit_amqqueue_process,terminate_shutdown,2,[]},
> {gen_server2,terminate,3,[]},
> {proc_lib,wake_up,3,
> [{file,"proc_lib.erl"},{line,249}]}]}
> in function gen_server2:terminate/3
> ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.154.0>]
> messages: []
> links: [<0.250.0>,#Port<0.17147>]
> dictionary: [{{ch,<17654.9226.6150>},
> {cr,<17654.9226.6150>,#Ref<0.0.18055.20563>,
> {[],[26925191]},
> 1,
> {queue,
> [{<17654.9226.6150>,
>
> {consumer,<<"amq.ctag-LPmzPvp2doZ9pYs-cEEcFg">>,
> true,[]}}],
> [],1},
> {qstate,<17654.21979.6150>,suspended,{0,nil}},
> 4}},
> {credit_blocked,[]},
> {{ch,<17659.4312.6334>},
> {cr,<17659.4312.6334>,#Ref<0.0.18273.227308>,
> {[],[26925208]},
> 1,
> {queue,
> [{<17659.4312.6334>,
>
> {consumer,<<"amq.ctag--3Kwc_Q-QS9kcpZ9U--8-Q">>,
> true,[]}}],
> [],1},
> {qstate,<17659.2894.6334>,suspended,{0,nil}},
> 19}},
> {{ch,<17659.3911.6334>},
> {cr,<17659.3911.6334>,#Ref<0.0.18273.227286>,
> {[26925232,26925226],[26925214]},
> 1,
> {queue,[],[],0},
> {qstate,<17659.2051.6334>,active,{0,nil}},
> 22}},
> {{#Ref<0.0.0.36427>,fhc_handle},
> {handle,
>
> {file_descriptor,prim_file,{#Port<0.17147>,132}},
> 118224,false,5136,infinity,
> [[<<192,0,0,0,1,154,216,155>>],
> [<<192,0,0,0,1,154,216,151>>],
> [<<192,0,0,0,1,154,216,150>>],
> [<<192,0,0,0,1,154,216,149>>],
> [<<192,0,0,0,1,154,216,148>>],
> [<<192,0,0,0,1,154,216,147>>],
> [<<192,0,0,0,1,154,216,146>>],
> [<<192,0,0,0,1,154,216,144>>],
> [<<192,0,0,0,1,154,216,142>>],
> [<<192,0,0,0,1,154,216,143>>],
> [<<192,0,0,0,1,154,216,141>>],
> [<<192,0,0,0,1,154,216,140>>],
> .,...
>
>
>
> =SUPERVISOR REPORT==== 13-Feb-2014::05:14:36 ===
> Supervisor: {local,rabbit_amqqueue_sup}
> Context: child_terminated
> Reason: {{badmatch,{error,not_found}},
> [{rabbit_mirror_queue_master,stop_all_slaves,2,[]},
>
> {rabbit_mirror_queue_master,delete_and_terminate,2,[]},
>
> {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,[]},
> {rabbit_amqqueue_process,terminate_shutdown,2,[]},
> {gen_server2,terminate,3,[]},
>
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
> Offender: [{pid,<0.367.0>},
> {name,rabbit_amqqueue},
>
> {mfargs,{rabbit_amqqueue_process,start_link,undefined}},
> {restart_type,temporary},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 13-Feb-2014::10:59:28 ===
> Supervisor: {<0.19778.5266>,
> amqp_channel_sup_sup}
> Context: shutdown_error
> Reason: shutdown
> Offender: [{nb_children,1},
> {name,channel_sup},
> {mfargs,
>
> {amqp_channel_sup,start_link,[direct,<0.20460.5266>]}},
> {restart_type,temporary},
> {shutdown,brutal_kill},
> {child_type,supervisor}]
>
>
> =SUPERVISOR REPORT==== 13-Feb-2014::11:02:34 ===
> Supervisor: {<0.852.5267>,amqp_channel_sup_sup}
> Context: shutdown_error
> Reason: shutdown
> Offender: [{nb_children,1},
> {name,channel_sup},
> {mfargs,
>
> {amqp_channel_sup,start_link,[direct,<0.2623.5267>]}},
> {restart_type,temporary},
> {shutdown,brutal_kill},
> {child_type,supervisor}]
>
>
> =SUPERVISOR REPORT==== 13-Feb-2014::11:03:24 ===
> Supervisor: {<0.4628.5267>,amqp_channel_sup_sup}
> Context: shutdown_error
> Reason: shutdown
> Offender: [{nb_children,1},
> {name,channel_sup},
> {mfargs,
>
> {amqp_channel_sup,start_link,[direct,<0.5878.5267>]}},
> {restart_type,temporary},
> {shutdown,brutal_kill},
> {child_type,supervisor}]
>
>
> =CRASH REPORT==== 13-Feb-2014::11:12:31 ===
> crasher:
> initial call: gen:init_it/6
> pid: <0.4699.5268>
> registered_name: []
> exception exit: {{badmatch,true},
> [{rabbit_queue_index,init,2,[]},
> {rabbit_variable_queue,init,5,[]},
> {rabbit_mirror_queue_master,init,3,[]},
> {rabbit_amqqueue_process,declare,3,[]},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,init_p_do_apply,3,
> [{file,"proc_lib.erl"},{line,239}]}]}
> in function gen_server2:terminate/3
> ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.154.0>]
> messages: []
> links: [<0.250.0>]
> dictionary:
> [{{xtype_to_module,direct},rabbit_exchange_type_direct}]
> trap_exit: true
> status: running
> heap_size: 1598
> stack_size: 27
> reductions: 1156
> neighbours:
>
> =SUPERVISOR REPORT==== 13-Feb-2014::11:12:31 ===
> Supervisor: {local,rabbit_amqqueue_sup}
> Context: child_terminated
> Reason: {{badmatch,true},
> [{rabbit_queue_index,init,2,[]},
> {rabbit_variable_queue,init,5,[]},
> {rabbit_mirror_queue_master,init,3,[]},
> {rabbit_amqqueue_process,declare,3,[]},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,init_p_do_apply,3,
> [{file,"proc_lib.erl"},{line,239}]}]}
> Offender: [{pid,<0.4699.5268>},
> {name,rabbit_amqqueue},
>
> {mfargs,{rabbit_amqqueue_process,start_link,undefined}},
> {restart_type,temporary},
> {shutdown,4294967295},
> {child_type,worker}]
>
> =SUPERVISOR REPORT==== 13-Feb-2014::11:35:08 ===
> Supervisor: {<0.6708.5271>,amqp_channel_sup_sup}
> Context: shutdown_error
> Reason: shutdown
> Offender: [{nb_children,1},
> {name,channel_sup},
> {mfargs,
>
> {amqp_channel_sup,start_link,[direct,<0.7855.5271>]}},
> {restart_type,temporary},
> {shutdown,brutal_kill},
> {child_type,supervisor}]
>
>
> --
> Jason McIntosh
> https://github.com/jasonmcintosh/
> 573-424-7612 <tel:573-424-7612>
>
>
>
>
> --
> Jason McIntosh
> https://github.com/jasonmcintosh/
> 573-424-7612
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
--
Simon MacMullen
RabbitMQ, Pivotal
More information about the rabbitmq-discuss
mailing list