[rabbitmq-discuss] Mirrored HA queues disappeared on one out of two nodes restarting
Mark Steele
marks at nationalfibre.net
Wed Apr 30 16:14:02 BST 2014
Hi all,
When restarting a node in a cluster that contained mirrored queues, I just experienced a mirrored queue disappearing completely from the cluster.
Both nodes in the cluster were both ram and disc nodes.
This is extremely worrisome to say the least.
--
=INFO REPORT==== 28-Apr-2014::14:17:07 ===
Synchronising queue 'affiliate_clicks' in vhost '/': 5565 messages to synchronise
=INFO REPORT==== 28-Apr-2014::14:17:07 ===
Synchronising queue 'affiliate_clicks' in vhost '/': all slaves already synced
=INFO REPORT==== 29-Apr-2014::22:19:06 ===
Mirrored-queue (queue 'affiliate_clicks' in vhost '/'): Slave <rabbit at mq04.1.274.0> saw deaths of mirrors <rabbit at mq03.1.280.0>
=INFO REPORT==== 29-Apr-2014::22:19:06 ===
Mirrored-queue (queue 'affiliate_clicks' in vhost '/'): Promoting slave <rabbit at mq04.1.274.0> to master
=INFO REPORT==== 29-Apr-2014::22:19:33 ===
rabbit on node rabbit at mq03 up
=INFO REPORT==== 29-Apr-2014::22:19:33 ===
Synchronising queue 'affiliate_clicks' in vhost '/': complete
=INFO REPORT==== 29-Apr-2014::22:19:33 ===
Synchronising queue 'affiliate_clicks' in vhost '/': 4696 messages to synchronise
=INFO REPORT==== 29-Apr-2014::22:19:33 ===
Synchronising queue 'affiliate_clicks' in vhost '/': all slaves already synced
<snip> lots of connection logs, then kaboom </snip>
=INFO REPORT==== 29-Apr-2014::22:23:48 ===
Mirrored-queue (queue 'affiliate_clicks' in vhost '/'): Master <rabbit at mq04.1.274.0> saw deaths of mirrors <rabbit at mq03.2.277.0>
=ERROR REPORT==== 29-Apr-2014::22:23:50 ===
** Generic server <0.274.0> terminating
** Last message in was emit_stats
** When Server state == {q,
{amqqueue,
{resource,<<"/">>,queue,<<"affiliate_clicks">>},
true,false,none,[],<0.274.0>,[],[],
[{vhost,<<"/">>},
{name,<<"affiliate_queues">>},
{pattern,<<"^affiliate_.*$">>},
{definition,
[{<<"ha-mode">>,<<"all">>},
{<<"ha-sync-mode">>,<<"automatic">>}]},
{priority,0}],
[{<2827.281.0>,<2827.280.0>}]},
none,false,rabbit_mirror_queue_master,
{state,
{resource,<<"/">>,queue,<<"affiliate_clicks">>},
<0.275.0>,<0.19739.588>,rabbit_variable_queue,
{vqstate,
{0,{[],[]}},
{0,{[],[]}},
{delta,undefined,0,undefined},
{0,{[],[]}},
{2660,
{[{msg_status,2363798,
<<117,194,172,33,185,58,225,43,141,116,31,73,
152,23,146,23>>,
{basic_message,
{resource,<<"/">>,exchange,
<<"affiliate_clicks">>},
[<<"#">>],
{content,60,
{'P_basic',<<"application/json">>,undefined,
undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined},
<<128,0,16,97,112,112,108,105,99,97,116,105,
111,110,47,106,115,111,110>>,
rabbit_framing_amqp_0_9_1,
[<<"DATA SNIPPED OUT">>]},
<<205,79,109,87,12,83,109,226,230,122,218,63,
27,68,138,67>>,
false},
false,false,false,false,
<LOTS OF REPEATING LOG DATA>
2363799,
{0,nil},
{0,nil},
{qistate,
"/var/lib/rabbitmq/mnesia/rabbit at mq04/queues/D8CDHLZOTXCZL6MJMMYRK9EAN",
{{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
[]},
undefined,0,65536,
#Fun<rabbit_variable_queue.2.81334491>,
{0,nil}},
{{client_msstate,msg_store_persistent,
<<69,37,230,131,60,26,47,62,12,194,26,130,4,129,
159,57>>,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
{state,356427,
"/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_persistent"},
rabbit_msg_store_ets_index,
"/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_persistent",
<0.265.0>,360524,352330,364621,368718},
{client_msstate,msg_store_transient,
<<140,110,236,52,188,182,217,136,180,245,92,51,
176,116,195,10>>,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
{state,335942,
"/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_transient"},
rabbit_msg_store_ets_index,
"/var/lib/rabbitmq/mnesia/rabbit at mq04/msg_store_transient",
<0.260.0>,340039,331840,344136,348233}},
true,0,2660,0,infinity,2660,2660,0,0,0,
{rates,
{{1398,824624,347232},0},
{{1398,824624,347232},84},
0.0,17.611352475686193,
{1398,824629,389132}},
{0,nil},
{0,nil},
{0,nil},
{0,nil},
0,0,
{rates,
{{1398,824624,347232},6706},
{{1398,824624,347232},0},
663.4928634941101,0.0,
{1398,824629,389132}}},
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
[],
{set,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}}},
{[],[]},
undefined,undefined,undefined,undefined,
{state,fine,5000,#Ref<0.0.527.127396>},
{0,nil},
undefined,undefined,undefined,
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[[<0.18782.588>|#Ref<0.0.524.252716>]]}}},
undefined,undefined,undefined,running}
** Reason for termination ==
** {{badmatch,{error,not_found}},
[{rabbit_mirror_queue_master,stop_all_slaves,2},
{rabbit_mirror_queue_master,delete_and_terminate,2},
{rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
{rabbit_amqqueue_process,terminate_shutdown,2},
{gen_server2,terminate,3},
{proc_lib,wake_up,3}]}
** In 'terminate' callback with reason ==
** {{badmatch,{error,not_found}},
[{rabbit_amqqueue_process,i,2},
{rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2},
{rabbit_amqqueue_process,'-infos/2-lc$^0/1-0-',2},
{rabbit_amqqueue_process,emit_stats,2},
{rabbit_amqqueue_process,handle_info,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}
Here's the error in the SASL log:
=SUPERVISOR REPORT==== 29-Apr-2014::22:23:55 ===
Supervisor: {local,
rabbit_mirror_queue_slave_sup}
Context: child_terminated
Reason: {{badmatch,{error,not_found}},
[{rabbit_mirror_queue_master,stop_all_slaves,2},
{rabbit_mirror_queue_master,delete_and_terminate,2},
{rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6},
{rabbit_amqqueue_process,terminate_shutdown,2},
{gen_server2,terminate,3},
{proc_lib,wake_up,3}]}
Offender: [{pid,<0.274.0>},
{name,rabbit_mirror_queue_slave},
{mfa,
{rabbit_mirror_queue_slave,start_link,
[{amqqueue,
{resource,<<"/">>,queue,<<"affiliate_clicks">>},
true,false,none,[],<2827.280.0>,[],[],
[{vhost,<<"/">>},
{name,<<"affiliate_queues">>},
{pattern,<<"^affiliate_.*$">>},
{definition,
[{<<"ha-mode">>,<<"all">>},
{<<"ha-sync-mode">>,<<"automatic">>}]},
{priority,0}],
[{<2827.281.0>,<2827.280.0>}]}]}},
{restart_type,temporary},
{shutdown,4294967295},
{child_type,worker}]
Known issue? Need to update? Please advise.
Cheers,
Mark Steele, CISSP, CSM, GCIA, GPEN
Director of development
Instaclick Inc.
marks at nationalfibre.net
m: (416) 844-9221
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140430/9686c047/attachment.html>
More information about the rabbitmq-discuss
mailing list