[rabbitmq-discuss] Mirrored HA queues disappeared on one out of two nodes restarting

Mark Steele marks at nationalfibre.net
Wed Apr 30 16:14:02 BST 2014


Hi all,

When restarting a node in a cluster that contained mirrored queues, I just experienced a mirrored queue disappearing completely from the cluster.

Both nodes in the cluster were both ram and disc nodes.

This is extremely worrisome to say the least.

--

=INFO REPORT==== 28-Apr-2014::14:17:07 ===
Synchronising queue 'affiliate_clicks' in vhost '/': 5565 messages to synchronise

=INFO REPORT==== 28-Apr-2014::14:17:07 ===
Synchronising queue 'affiliate_clicks' in vhost '/': all slaves already synced

=INFO REPORT==== 29-Apr-2014::22:19:06 ===
Mirrored-queue (queue 'affiliate_clicks' in vhost '/'): Slave <rabbit at mq04.1.274.0> saw deaths of mirrors <rabbit at mq03.1.280.0> 

=INFO REPORT==== 29-Apr-2014::22:19:06 ===
Mirrored-queue (queue 'affiliate_clicks' in vhost '/'): Promoting slave <rabbit at mq04.1.274.0> to master

=INFO REPORT==== 29-Apr-2014::22:19:33 ===
rabbit on node rabbit at mq03 up


=INFO REPORT==== 29-Apr-2014::22:19:33 ===
Synchronising queue 'affiliate_clicks' in vhost '/': complete

=INFO REPORT==== 29-Apr-2014::22:19:33 ===
Synchronising queue 'affiliate_clicks' in vhost '/': 4696 messages to synchronise

=INFO REPORT==== 29-Apr-2014::22:19:33 ===
Synchronising queue 'affiliate_clicks' in vhost '/': all slaves already synced

<snip> lots of connection logs, then kaboom </snip>

=INFO REPORT==== 29-Apr-2014::22:23:48 ===
Mirrored-queue (queue 'affiliate_clicks' in vhost '/'): Master <rabbit at mq04.1.274.0> saw deaths of mirrors <rabbit at mq03.2.277.0> 



=ERROR REPORT==== 29-Apr-2014::22:23:50 ===
** Generic server <0.274.0> terminating
** Last message in was emit_stats
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/">>,queue,<<"affiliate_clicks">>},
                          true,false,none,[],<0.274.0>,[],[],
                          [{vhost,<<"/">>},
                           {name,<<"affiliate_queues">>},
                           {pattern,<<"^affiliate_.*$">>},
                           {definition,
                            [{<<"ha-mode">>,<<"all">>},
                             {<<"ha-sync-mode">>,<<"automatic">>}]},
                           {priority,0}],
                          [{<2827.281.0>,<2827.280.0>}]},
                         none,false,rabbit_mirror_queue_master,
                         {state,
                          {resource,<<"/">>,queue,<<"affiliate_clicks">>},
                          <0.275.0>,<0.19739.588>,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {2660,
                            {[{msg_status,2363798,
                               <<117,194,172,33,185,58,225,43,141,116,31,73,
                                 152,23,146,23>>,
                               {basic_message,
                                {resource,<<"/">>,exchange,
                                 <<"affiliate_clicks">>},
                                [<<"#">>],
                                {content,60,
                                 {'P_basic',<<"application/json">>,undefined,
                                  undefined,undefined,undefined,undefined,
                                  undefined,undefined,undefined,undefined,
                                  undefined,undefined,undefined,undefined},
                                 <<128,0,16,97,112,112,108,105,99,97,116,105,
                                   111,110,47,106,115,111,110>>,
                                 rabbit_framing_amqp_0_9_1,
                                 [<<"DATA SNIPPED OUT">>]},
                                <<205,79,109,87,12,83,109,226,230,122,218,63,
                                  27,68,138,67>>,
                                false},
                               false,false,false,false,
                               

<LOTS OF REPEATING LOG DATA>


                           2363799,
                           {0,nil},
                           {0,nil},
                           {qistate,
                            "/var/lib/rabbitmq/mnesia/rabbit at mq04/queues/D8CDHLZOTXCZL6MJMMYRK9EAN",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,65536,
                            #Fun<rabbit_variable_queue.2.81334491>,
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,
                             <<69,37,230,131,60,26,47,62,12,194,26,130,4,129,
                               159,57>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,356427,
                              "/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_persistent"},
                             rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_persistent",
                             <0.265.0>,360524,352330,364621,368718},
                            {client_msstate,msg_store_transient,
                             <<140,110,236,52,188,182,217,136,180,245,92,51,
                               176,116,195,10>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,335942,
                              "/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_transient",
                             <0.260.0>,340039,331840,344136,348233}},
                           true,0,2660,0,infinity,2660,2660,0,0,0,
                           {rates,
                            {{1398,824624,347232},0},
                            {{1398,824624,347232},84},
                            0.0,17.611352475686193,
                            {1398,824629,389132}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,
                           {rates,
                            {{1398,824624,347232},6706},
                            {{1398,824624,347232},0},
                            663.4928634941101,0.0,
                            {1398,824629,389132}}},
                          {dict,0,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          [],
                          {set,0,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}}},
                         {[],[]},
                         undefined,undefined,undefined,undefined,
                         {state,fine,5000,#Ref<0.0.527.127396>},
                         {0,nil},
                         undefined,undefined,undefined,
                         {dict,1,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                            [[<0.18782.588>|#Ref<0.0.524.252716>]]}}},
                         undefined,undefined,undefined,running}
** Reason for termination == 
** {{badmatch,{error,not_found}},
    [{rabbit_mirror_queue_master,stop_all_slaves,2},
     {rabbit_mirror_queue_master,delete_and_terminate,2},
     {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
     {rabbit_amqqueue_process,terminate_shutdown,2},
     {gen_server2,terminate,3},
     {proc_lib,wake_up,3}]}
** In 'terminate' callback with reason ==
** {{badmatch,{error,not_found}},
    [{rabbit_amqqueue_process,i,2},
     {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2},
     {rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2},
     {rabbit_amqqueue_process,emit_stats,2},
     {rabbit_amqqueue_process,handle_info,2},
     {gen_server2,handle_msg,2},
     {proc_lib,wake_up,3}]}




Here's the error in the SASL log:


=SUPERVISOR REPORT==== 29-Apr-2014::22:23:55 ===
     Supervisor: {local,
                                           rabbit_mirror_queue_slave_sup}
     Context:    child_terminated
     Reason:     {{badmatch,{error,not_found}},
                  [{rabbit_mirror_queue_master,stop_all_slaves,2},
                   {rabbit_mirror_queue_master,delete_and_terminate,2},
                   {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
                   {rabbit_amqqueue_process,terminate_shutdown,2},
                   {gen_server2,terminate,3},
                   {proc_lib,wake_up,3}]}
     Offender:   [{pid,<0.274.0>},
                  {name,rabbit_mirror_queue_slave},
                  {mfa,
                      {rabbit_mirror_queue_slave,start_link,
                          [{amqqueue,
                               {resource,<<"/">>,queue,<<"affiliate_clicks">>},
                               true,false,none,[],<2827.280.0>,[],[],
                               [{vhost,<<"/">>},
                                {name,<<"affiliate_queues">>},
                                {pattern,<<"^affiliate_.*$">>},
                                {definition,
                                    [{<<"ha-mode">>,<<"all">>},
                                     {<<"ha-sync-mode">>,<<"automatic">>}]},
                                {priority,0}],
                               [{<2827.281.0>,<2827.280.0>}]}]}},
                  {restart_type,temporary},
                  {shutdown,4294967295},
                  {child_type,worker}]


Known issue? Need to update? Please advise.

Cheers,

Mark Steele, CISSP, CSM, GCIA, GPEN
Director of development
Instaclick Inc.
marks at nationalfibre.net
m: (416) 844-9221

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140430/9686c047/attachment.html>


More information about the rabbitmq-discuss mailing list