[rabbitmq-discuss] Mirrored HA queues disappeared on one out of two nodes restarting
Mark Steele
marks at nationalfibre.net
Wed Apr 30 16:34:34 BST 2014
MQ03 was the master for that queue, and it was cleanly restarted (service rabbitmq-server restart). So the chain of events was mq03 restarted, mq04 became master for that queue, mq03 rejoined after restart, then queue disappeared a few minutes later.
The sasl log on 03:
=CRASH REPORT==== 29-Apr-2014::22:24:15 ===
crasher:
initial call: gen:init_it/6
pid: <0.277.0>
registered_name: []
exception exit: {function_clause,
[{gb_trees,delete_1,[4227,nil]},
{gb_trees,delete,2},
{rabbit_variable_queue,remove_pending_ack,2},
{rabbit_variable_queue,'-ack/2-fun-0-',2},
{lists,foldl,3},
{rabbit_variable_queue,ack,2},
{rabbit_mirror_queue_slave,process_instruction,2},
{rabbit_mirror_queue_slave,handle_cast,2}]}
in function gen_server2:terminate/3
ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.168.0>]
messages: [{'$gen_cast',
{gm,
{publish,<2828.25884.590>,
{message_properties,undefined,false},
{basic_message,
{resource,<<"/">>,exchange,<<"affiliate_clicks">>},
[<<"#">>],
{content,60,
{'P_basic',<<"application/json">>,undefined,undefined,
undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,undefined,
undefined},
<<128,0,16,97,112,112,108,105,99,97,116,105,111,110,47,
106,115,111,110>>,
rabbit_framing_amqp_0_9_1,
[<<"snipped out">>]},
<<117,194,172,33,185,58,225,43,141,116,31,73,152,23,146,
23>>,
false}}}},
{'$gen_cast',
{deliver,
{delivery,false,<2828.25884.590>,
{basic_message,
{resource,<<"/">>,exchange,<<"affiliate_clicks">>},
[<<"#">>],
{content,60,
{'P_basic',<<"application/json">>,undefined,undefined,
undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,undefined,
undefined},
<<128,0,16,97,112,112,108,105,99,97,116,105,111,110,47,
106,115,111,110>>,
rabbit_framing_amqp_0_9_1,
[<<"snipped out}">>]},
<<117,194,172,33,185,58,225,43,141,116,31,73,152,23,146,
23>>,
false},
undefined},
true,flow}},
{'$gen_cast',{gm,{sender_death,<2828.25884.590>}}},
{'$gen_cast',
{run_backing_queue,rabbit_mirror_queue_master,
#Fun<rabbit_mirror_queue_slave.8.82654898>}},
{'$gen_cast',
{run_backing_queue,rabbit_mirror_queue_master,
#Fun<rabbit_mirror_queue_slave.8.82654898>}},
{'EXIT',<0.278.0>,normal},
{'$gen_cast',
{run_backing_queue,rabbit_mirror_queue_master,
#Fun<rabbit_mirror_queue_slave.8.82654898>}},
{'$gen_cast',
{run_backing_queue,rabbit_mirror_queue_master,
#Fun<rabbit_mirror_queue_slave.8.82654898>}}]
links: [<0.273.0>]
dictionary: [{credit_blocked,[]},
{{xtype_to_module,direct},rabbit_exchange_type_direct},
{{xtype_to_module,topic},rabbit_exchange_type_topic},
{guid,{{4125214297,1894440844,1353716068,2733218191},1}}]
trap_exit: true
status: running
heap_size: 317811
stack_size: 24
reductions: 4292143
neighbours:
=SUPERVISOR REPORT==== 29-Apr-2014::22:24:15 ===
Supervisor: {local,
rabbit_mirror_queue_slave_sup}
Context: child_terminated
Reason: {function_clause,
[{gb_trees,delete_1,[4227,nil]},
{gb_trees,delete,2},
{rabbit_variable_queue,remove_pending_ack,2},
{rabbit_variable_queue,'-ack/2-fun-0-',2},
{lists,foldl,3},
{rabbit_variable_queue,ack,2},
{rabbit_mirror_queue_slave,process_instruction,2},
{rabbit_mirror_queue_slave,handle_cast,2}]}
Offender: [{pid,<0.277.0>},
{name,rabbit_mirror_queue_slave},
{mfa,
{rabbit_mirror_queue_slave,start_link,
[{amqqueue,
{resource,<<"/">>,queue,<<"affiliate_clicks">>},
true,false,none,[],<2828.274.0>,[],[],
[{vhost,<<"/">>},
{name,<<"affiliate_queues">>},
{pattern,<<"^affiliate_.*$">>},
{definition,
[{<<"ha-mode">>,<<"all">>},
{<<"ha-sync-mode">>,<<"automatic">>}]},
{priority,0}],
[{<2828.275.0>,<2828.274.0>}]}]}},
{restart_type,temporary},
{shutdown,4294967295},
{child_type,worker}]
Mark Steele, CISSP, CSM, GCIA, GPEN
Director of development
Instaclick Inc.
marks at nationalfibre.net
m: (416) 844-9221
On Apr 30, 2014, at 11:24 AM, Simon MacMullen <simon at rabbitmq.com> wrote:
> On 30/04/14 16:14, Mark Steele wrote:
>> Known issue? Need to update? Please advise.
>
> There was a known issue, fixed in 3.2.1, where a crashing slave could cause the master (and other slaves of the same queue) to crash with stack traces like the ones you posted. So that's definitely a reason to upgrade.
>
> Of course, that doesn't help us with why the first slave crashed. It's quite possibly another bug that has been fixed since, but just to be sure, could you look for and post any errors from the same time frame mentioning "affiliate_clicks" on mq03?
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, Pivotal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140430/fb19e62e/attachment.html>
More information about the rabbitmq-discuss
mailing list