[rabbitmq-discuss] Queue failure, potential loss of data.

Simon MacMullen simon at rabbitmq.com
Fri Feb 14 16:32:47 GMT 2014


Were there more than two nodes in the cluster? That looks like another 
case of a mirror being taken out by collateral damage from the original 
slave.

Cheers, Simon

On 14/02/2014 4:24PM, Jason McIntosh wrote:
> BTW, here are the sasl logs from another node in the cluster:
>
> =CRASH REPORT==== 13-Feb-2014::05:14:36 ===
>    crasher:
>      initial call: gen:init_it/6
>      pid: <0.987.0>
>      registered_name: []
>      exception exit: {{badmatch,{error,not_found}},
>                       [{rabbit_amqqueue_process,i,2,[]},
>                        {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
>                        {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
>                        {rabbit_amqqueue_process,emit_stats,2,[]},
>                        {rabbit_event,if_enabled,3,[]},
>
> {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',
>                            6,[]},
>                        {rabbit_amqqueue_process,terminate_shutdown,2,[]},
>                        {gen_server2,terminate,3,[]}]}
>        in function  gen_server2:terminate/3
>      ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.782.0>]
>      messages: [{'$gen_cast',
>                        {run_backing_queue,rabbit_variable_queue,
>                            #Fun<rabbit_variable_queue.26.70600163>}},
>                    {'EXIT',<0.988.0>,normal}]
>      links: [<0.954.0>]
>      dictionary: [{{credit_from,<0.944.0>},1671},
>                    {{credit_to,<0.24877.6355>},2},
>                    {credit_blocked,[]},
>                    {delegate,delegate_0},
>                    {fhc_age_tree,{0,nil}},
>                    {guid,{{2283490857
> <tel:2283490857>,778293189,3964001052,3912480778},1}}]
>      trap_exit: true
>      status: running
>      heap_size: 6772
>      stack_size: 27
>      reductions: 28827118159
>    neighbours:
>
> =SUPERVISOR REPORT==== 13-Feb-2014::05:14:36 ===
>       Supervisor: {local,
>                                             rabbit_mirror_queue_slave_sup}
>       Context:    child_terminated
>       Reason:     {{badmatch,{error,not_found}},
>                    [{rabbit_amqqueue_process,i,2,[]},
>                     {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
>                     {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2,[]},
>                     {rabbit_amqqueue_process,emit_stats,2,[]},
>                     {rabbit_event,if_enabled,3,[]},
>
>   {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,[]},
>                     {rabbit_amqqueue_process,terminate_shutdown,2,[]},
>                     {gen_server2,terminate,3,[]}]}
>       Offender:   [{pid,<0.987.0>},
>                    {name,rabbit_mirror_queue_slave},
>
> {mfargs,{rabbit_mirror_queue_slave,start_link,undefined}},
>                    {restart_type,temporary},
>                    {shutdown,4294967295},
>                    {child_type,worker}]
>
>
>
>
>
> On Fri, Feb 14, 2014 at 9:43 AM, Jason McIntosh <mcintoshj at gmail.com
> <mailto:mcintoshj at gmail.com>> wrote:
>
>
>     RabbitMQ 3.2.0
>     Erlang R16B02-1
>
>     We have a queue that basically stopped doing anything intelligent.
>       Here are the results.  What's bad about this - it appears that
>     messages continued to publish and didn't hit the dead letter
>     exchange - they just disappeared.  In this architecture, we've got a
>     fanout exchange that publishes to two queues.  One of the queues is
>     working fine still.  Our second queue though is what dropped off.
>       Publishing though hasn't failed so I'm worried we've lost data for
>     the last data.  Any input would be welcome on this.  Here's the
>     second queues information from the management gui:
>     cluster at rabbitmqm10p		DLX DLK D Args 		Active	?	?	?	0.00/s
>
>
>     When I try and select the queue, I just get an error message:
>     TypeError: Cannot read property 'ram_msg_count' of undefined
>
>     Any help/advice here?  Is there some way I can change this queue so
>     I do NOT lose messages and publishes fail??  I thought publisher
>     confirms (need to verify they're on) would have taken care of this
>     situation - that the message would have had to have been consumed or
>     persisted to disk for all queues or publishing would have been rejected.
>     Jason
>
>
>
>     =CRASH REPORT==== 13-Feb-2014::05:14:36 ===
>        crasher:
>          initial call: gen:init_it/6
>          pid: <0.367.0>
>          registered_name: []
>          exception exit: {{badmatch,{error,not_found}},
>
>       [{rabbit_mirror_queue_master,stop_all_slaves,2,[]},
>
>     {rabbit_mirror_queue_master,delete_and_terminate,2,[]},
>
>     {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',
>                                6,[]},
>
>     {rabbit_amqqueue_process,terminate_shutdown,2,[]},
>                            {gen_server2,terminate,3,[]},
>                            {proc_lib,wake_up,3,
>                                [{file,"proc_lib.erl"},{line,249}]}]}
>            in function  gen_server2:terminate/3
>          ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.154.0>]
>          messages: []
>          links: [<0.250.0>,#Port<0.17147>]
>          dictionary: [{{ch,<17654.9226.6150>},
>                         {cr,<17654.9226.6150>,#Ref<0.0.18055.20563>,
>                             {[],[26925191]},
>                             1,
>                             {queue,
>                                 [{<17654.9226.6150>,
>
>       {consumer,<<"amq.ctag-LPmzPvp2doZ9pYs-cEEcFg">>,
>                                       true,[]}}],
>                                 [],1},
>                             {qstate,<17654.21979.6150>,suspended,{0,nil}},
>                             4}},
>                        {credit_blocked,[]},
>                        {{ch,<17659.4312.6334>},
>                         {cr,<17659.4312.6334>,#Ref<0.0.18273.227308>,
>                             {[],[26925208]},
>                             1,
>                             {queue,
>                                 [{<17659.4312.6334>,
>
>       {consumer,<<"amq.ctag--3Kwc_Q-QS9kcpZ9U--8-Q">>,
>                                       true,[]}}],
>                                 [],1},
>                             {qstate,<17659.2894.6334>,suspended,{0,nil}},
>                             19}},
>                        {{ch,<17659.3911.6334>},
>                         {cr,<17659.3911.6334>,#Ref<0.0.18273.227286>,
>                             {[26925232,26925226],[26925214]},
>                             1,
>                             {queue,[],[],0},
>                             {qstate,<17659.2051.6334>,active,{0,nil}},
>                             22}},
>                        {{#Ref<0.0.0.36427>,fhc_handle},
>                         {handle,
>
>       {file_descriptor,prim_file,{#Port<0.17147>,132}},
>                             118224,false,5136,infinity,
>                             [[<<192,0,0,0,1,154,216,155>>],
>                              [<<192,0,0,0,1,154,216,151>>],
>                              [<<192,0,0,0,1,154,216,150>>],
>                              [<<192,0,0,0,1,154,216,149>>],
>                              [<<192,0,0,0,1,154,216,148>>],
>                              [<<192,0,0,0,1,154,216,147>>],
>                              [<<192,0,0,0,1,154,216,146>>],
>                              [<<192,0,0,0,1,154,216,144>>],
>                              [<<192,0,0,0,1,154,216,142>>],
>                              [<<192,0,0,0,1,154,216,143>>],
>                              [<<192,0,0,0,1,154,216,141>>],
>                              [<<192,0,0,0,1,154,216,140>>],
>     .,...
>
>
>
>     =SUPERVISOR REPORT==== 13-Feb-2014::05:14:36 ===
>           Supervisor: {local,rabbit_amqqueue_sup}
>           Context:    child_terminated
>           Reason:     {{badmatch,{error,not_found}},
>                        [{rabbit_mirror_queue_master,stop_all_slaves,2,[]},
>
>       {rabbit_mirror_queue_master,delete_and_terminate,2,[]},
>
>       {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,[]},
>                         {rabbit_amqqueue_process,terminate_shutdown,2,[]},
>                         {gen_server2,terminate,3,[]},
>
>       {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>           Offender:   [{pid,<0.367.0>},
>                        {name,rabbit_amqqueue},
>
>     {mfargs,{rabbit_amqqueue_process,start_link,undefined}},
>                        {restart_type,temporary},
>                        {shutdown,4294967295},
>                        {child_type,worker}]
>
>
>     =SUPERVISOR REPORT==== 13-Feb-2014::10:59:28 ===
>           Supervisor: {<0.19778.5266>,
>                                                 amqp_channel_sup_sup}
>           Context:    shutdown_error
>           Reason:     shutdown
>           Offender:   [{nb_children,1},
>                        {name,channel_sup},
>                        {mfargs,
>
>     {amqp_channel_sup,start_link,[direct,<0.20460.5266>]}},
>                        {restart_type,temporary},
>                        {shutdown,brutal_kill},
>                        {child_type,supervisor}]
>
>
>     =SUPERVISOR REPORT==== 13-Feb-2014::11:02:34 ===
>           Supervisor: {<0.852.5267>,amqp_channel_sup_sup}
>           Context:    shutdown_error
>           Reason:     shutdown
>           Offender:   [{nb_children,1},
>                        {name,channel_sup},
>                        {mfargs,
>
>     {amqp_channel_sup,start_link,[direct,<0.2623.5267>]}},
>                        {restart_type,temporary},
>                        {shutdown,brutal_kill},
>                        {child_type,supervisor}]
>
>
>     =SUPERVISOR REPORT==== 13-Feb-2014::11:03:24 ===
>           Supervisor: {<0.4628.5267>,amqp_channel_sup_sup}
>           Context:    shutdown_error
>           Reason:     shutdown
>           Offender:   [{nb_children,1},
>                        {name,channel_sup},
>                        {mfargs,
>
>     {amqp_channel_sup,start_link,[direct,<0.5878.5267>]}},
>                        {restart_type,temporary},
>                        {shutdown,brutal_kill},
>                        {child_type,supervisor}]
>
>
>     =CRASH REPORT==== 13-Feb-2014::11:12:31 ===
>        crasher:
>          initial call: gen:init_it/6
>          pid: <0.4699.5268>
>          registered_name: []
>          exception exit: {{badmatch,true},
>                           [{rabbit_queue_index,init,2,[]},
>                            {rabbit_variable_queue,init,5,[]},
>                            {rabbit_mirror_queue_master,init,3,[]},
>                            {rabbit_amqqueue_process,declare,3,[]},
>                            {gen_server2,handle_msg,2,[]},
>                            {proc_lib,init_p_do_apply,3,
>                                      [{file,"proc_lib.erl"},{line,239}]}]}
>            in function  gen_server2:terminate/3
>          ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.154.0>]
>          messages: []
>          links: [<0.250.0>]
>          dictionary:
>     [{{xtype_to_module,direct},rabbit_exchange_type_direct}]
>          trap_exit: true
>          status: running
>          heap_size: 1598
>          stack_size: 27
>          reductions: 1156
>        neighbours:
>
>     =SUPERVISOR REPORT==== 13-Feb-2014::11:12:31 ===
>           Supervisor: {local,rabbit_amqqueue_sup}
>           Context:    child_terminated
>           Reason:     {{badmatch,true},
>                        [{rabbit_queue_index,init,2,[]},
>                         {rabbit_variable_queue,init,5,[]},
>                         {rabbit_mirror_queue_master,init,3,[]},
>                         {rabbit_amqqueue_process,declare,3,[]},
>                         {gen_server2,handle_msg,2,[]},
>                         {proc_lib,init_p_do_apply,3,
>                                   [{file,"proc_lib.erl"},{line,239}]}]}
>           Offender:   [{pid,<0.4699.5268>},
>                        {name,rabbit_amqqueue},
>
>     {mfargs,{rabbit_amqqueue_process,start_link,undefined}},
>                        {restart_type,temporary},
>                        {shutdown,4294967295},
>                        {child_type,worker}]
>
>     =SUPERVISOR REPORT==== 13-Feb-2014::11:35:08 ===
>           Supervisor: {<0.6708.5271>,amqp_channel_sup_sup}
>           Context:    shutdown_error
>           Reason:     shutdown
>           Offender:   [{nb_children,1},
>                        {name,channel_sup},
>                        {mfargs,
>
>     {amqp_channel_sup,start_link,[direct,<0.7855.5271>]}},
>                        {restart_type,temporary},
>                        {shutdown,brutal_kill},
>                        {child_type,supervisor}]
>
>
>     --
>     Jason McIntosh
>     https://github.com/jasonmcintosh/
>     573-424-7612 <tel:573-424-7612>
>
>
>
>
> --
> Jason McIntosh
> https://github.com/jasonmcintosh/
> 573-424-7612
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>

-- 
Simon MacMullen
RabbitMQ, Pivotal


More information about the rabbitmq-discuss mailing list