[rabbitmq-discuss] Mirrored HA queues disappeared on one out of two nodes restarting

Mark Steele marks at nationalfibre.net
Wed Apr 30 16:34:34 BST 2014


MQ03 was the master for that queue, and it was cleanly restarted (service rabbitmq-server restart). So the chain of events was mq03 restarted, mq04 became master for that queue, mq03 rejoined after restart, then queue disappeared a few minutes later.

The sasl log on 03:


=CRASH REPORT==== 29-Apr-2014::22:24:15 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.277.0>
    registered_name: []
    exception exit: {function_clause,
                        [{gb_trees,delete_1,[4227,nil]},
                         {gb_trees,delete,2},
                         {rabbit_variable_queue,remove_pending_ack,2},
                         {rabbit_variable_queue,'-ack/2-fun-0-',2},
                         {lists,foldl,3},
                         {rabbit_variable_queue,ack,2},
                         {rabbit_mirror_queue_slave,process_instruction,2},
                         {rabbit_mirror_queue_slave,handle_cast,2}]}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.168.0>]
    messages: [{'$gen_cast',
                   {gm,
                    {publish,<2828.25884.590>,
                     {message_properties,undefined,false},
                     {basic_message,
                      {resource,<<"/">>,exchange,<<"affiliate_clicks">>},
                      [<<"#">>],
                      {content,60,
                       {'P_basic',<<"application/json">>,undefined,undefined,
                        undefined,undefined,undefined,undefined,undefined,
                        undefined,undefined,undefined,undefined,undefined,
                        undefined},
                       <<128,0,16,97,112,112,108,105,99,97,116,105,111,110,47,
                         106,115,111,110>>,
                       rabbit_framing_amqp_0_9_1,
                       [<<"snipped out">>]},
                      <<117,194,172,33,185,58,225,43,141,116,31,73,152,23,146,
                        23>>,
                      false}}}},
                  {'$gen_cast',
                   {deliver,
                    {delivery,false,<2828.25884.590>,
                     {basic_message,
                      {resource,<<"/">>,exchange,<<"affiliate_clicks">>},
                      [<<"#">>],
                      {content,60,
                       {'P_basic',<<"application/json">>,undefined,undefined,
                        undefined,undefined,undefined,undefined,undefined,
                        undefined,undefined,undefined,undefined,undefined,
                        undefined},
                       <<128,0,16,97,112,112,108,105,99,97,116,105,111,110,47,
                         106,115,111,110>>,
                       rabbit_framing_amqp_0_9_1,
                       [<<"snipped out}">>]},
                      <<117,194,172,33,185,58,225,43,141,116,31,73,152,23,146,
                        23>>,
                      false},
                     undefined},
                    true,flow}},
                  {'$gen_cast',{gm,{sender_death,<2828.25884.590>}}},
                  {'$gen_cast',
                   {run_backing_queue,rabbit_mirror_queue_master,
                    #Fun<rabbit_mirror_queue_slave.8.82654898>}},
                  {'$gen_cast',
                   {run_backing_queue,rabbit_mirror_queue_master,
                    #Fun<rabbit_mirror_queue_slave.8.82654898>}},
                  {'EXIT',<0.278.0>,normal},
                  {'$gen_cast',
                   {run_backing_queue,rabbit_mirror_queue_master,
                    #Fun<rabbit_mirror_queue_slave.8.82654898>}},
                  {'$gen_cast',
                   {run_backing_queue,rabbit_mirror_queue_master,
                    #Fun<rabbit_mirror_queue_slave.8.82654898>}}]
    links: [<0.273.0>]
    dictionary: [{credit_blocked,[]},
                  {{xtype_to_module,direct},rabbit_exchange_type_direct},
                  {{xtype_to_module,topic},rabbit_exchange_type_topic},
                  {guid,{{4125214297,1894440844,1353716068,2733218191},1}}]
    trap_exit: true
    status: running
    heap_size: 317811
    stack_size: 24
    reductions: 4292143
  neighbours:

=SUPERVISOR REPORT==== 29-Apr-2014::22:24:15 ===
     Supervisor: {local,
                                           rabbit_mirror_queue_slave_sup}
     Context:    child_terminated
     Reason:     {function_clause,
                     [{gb_trees,delete_1,[4227,nil]},
                      {gb_trees,delete,2},
                      {rabbit_variable_queue,remove_pending_ack,2},
                      {rabbit_variable_queue,'-ack/2-fun-0-',2},
                      {lists,foldl,3},
                      {rabbit_variable_queue,ack,2},
                      {rabbit_mirror_queue_slave,process_instruction,2},
                      {rabbit_mirror_queue_slave,handle_cast,2}]}
     Offender:   [{pid,<0.277.0>},
                  {name,rabbit_mirror_queue_slave},
                  {mfa,
                      {rabbit_mirror_queue_slave,start_link,
                          [{amqqueue,
                               {resource,<<"/">>,queue,<<"affiliate_clicks">>},
                               true,false,none,[],<2828.274.0>,[],[],
                               [{vhost,<<"/">>},
                                {name,<<"affiliate_queues">>},
                                {pattern,<<"^affiliate_.*$">>},
                                {definition,
                                    [{<<"ha-mode">>,<<"all">>},
                                     {<<"ha-sync-mode">>,<<"automatic">>}]},
                                {priority,0}],
                               [{<2828.275.0>,<2828.274.0>}]}]}},
                  {restart_type,temporary},
                  {shutdown,4294967295},
                  {child_type,worker}]



Mark Steele, CISSP, CSM, GCIA, GPEN
Director of development
Instaclick Inc.
marks at nationalfibre.net
m: (416) 844-9221

On Apr 30, 2014, at 11:24 AM, Simon MacMullen <simon at rabbitmq.com> wrote:

> On 30/04/14 16:14, Mark Steele wrote:
>> Known issue? Need to update? Please advise.
> 
> There was a known issue, fixed in 3.2.1, where a crashing slave could cause the master (and other slaves of the same queue) to crash with stack traces like the ones you posted. So that's definitely a reason to upgrade.
> 
> Of course, that doesn't help us with why the first slave crashed. It's quite possibly another bug that has been fixed since, but just to be sure, could you look for and post any errors from the same time frame mentioning "affiliate_clicks" on mq03?
> 
> Cheers, Simon
> 
> -- 
> Simon MacMullen
> RabbitMQ, Pivotal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140430/fb19e62e/attachment.html>


More information about the rabbitmq-discuss mailing list