[rabbitmq-discuss] RabbitMQ Cluster, split network & VMWare snapshot
Jerry Kuch
jkuch at gopivotal.com
Thu Feb 20 16:56:25 GMT 2014
Hi, Michael:
This isn't terribly surprising. Snapshotting a VM is likely to render it
less responsive than it normally would be for some amount of time. Whether
that period ends before some other node in your cluster misses heartbeats
and gives you grief about it, is a coin flip.
Best regards,
Jerry
On Thu, Feb 20, 2014 at 3:12 AM, Michael Oullion <
michael.oullion at norbert-dentressangle.com> wrote:
> Hi all,
>
> We observe some net split on our cluster and we don't know why.
> Before change the net tick parameter and change the net split behavior, I
> want to understand why it's happening.
> Our environment is :
> RabbitMQ 3.2.1 Elrang R16B
> 3 RabbitMQ Node in the same sub-network
> RabbitMQ is installed on Windows 2008 R2 (VMWare ESXi 5.1)
> We have 4 Mirrored Queues on this cluster.
> In production, the normal stream is about 20 messages/second.
>
> We observe that split occurs always at the end of the snapshot (NetBackup)
> on the VM.
> But, we made snapshot each night and the network split occurs 1 time each
> 15 or 20 days.
>
> *Log server rabbit at FRA-VSP-32545 :*
> =INFO REPORT==== 19-Feb-2014::18:34:47 ===
> rabbit on node 'rabbit at FRA-VSP-32596' down
>
> =INFO REPORT==== 19-Feb-2014::18:34:49 ===
> Mirrored-queue (queue 'conso.queue.dead' in vhost '/IEC'): Slave
> <'rabbit at FRA-VSP-32545'.2.269.0> saw deaths of mirrors
> <'rabbit at FRA-VSP-32596'.1.270.0>
>
>
> *Log server rabbit at FRA-VSP-32596 :*
>
> =INFO REPORT==== 19-Feb-2014::18:34:28 ===
> rabbit on node 'rabbit at FRA-VSP-32545' down
>
> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
> ** Generic server <0.279.0> terminating
> ** Last message in was {'DOWN',#Ref<0.0.0.248452>,process,<5383.278.0>,
> noconnection}
> ** When Server state == {state,
> {76,<0.279.0>},
> {{79,<5383.278.0>},#Ref<0.0.0.248452>},
> {{82,<5066.278.0>},#Ref<0.0.1.42330>},
> {resource,<<"/IEC">>,queue,<<"conso.queue">>},
> rabbit_mirror_queue_coordinator,
> {83,
> [{{76,<0.279.0>},
> {view_member,
> {76,<0.279.0>},
> [],
> {79,<5383.278.0>},
> {82,<5066.278.0>}}},
> {{79,<5383.278.0>},
> {view_member,
> {79,<5383.278.0>},
> [],
> {82,<5066.278.0>},
> {76,<0.279.0>}}},
> {{82,<5066.278.0>},
> {view_member,
> {82,<5066.278.0>},
> [],
> {76,<0.279.0>},
> {79,<5383.278.0>}}}]},
> 1457518,
>
> [{{76,<0.279.0>},{member,{[],[]},1457518,1457518}},
> {{79,<5383.278.0>},{member,{[],[]},1,1}},
> {{82,<5066.278.0>},{member,{[],[]},0,0}}],
> [<0.1272.0>],
> {[],[]},
> [],undefined,
> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
> ** Reason for termination ==
> ** {function_clause,
> [{orddict,fetch,
> [{76,<0.279.0>},
> [{{82,<5066.278.0>},
> {view_member,
> {82,<5066.278.0>},
> [{79,<5383.278.0>}],
> {82,<5066.278.0>},
> {82,<5066.278.0>}}}]],
> [{file,"orddict.erl"},{line,72}]},
> {gm,check_neighbours,1,[]},
> {gm,handle_info,2,[]},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>
> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
> ** Generic server <0.283.0> terminating
> ** Last message in was {'DOWN',#Ref<0.0.0.248454>,process,<5383.282.0>,
> noconnection}
> ** When Server state == {state,
> {67,<0.283.0>},
> {{70,<5383.282.0>},#Ref<0.0.0.248454>},
> {{73,<5066.282.0>},#Ref<0.0.1.42352>},
> {resource,<<"/IEC">>,queue,<<"event.queue">>},
> rabbit_mirror_queue_coordinator,
> {74,
> [{{67,<0.283.0>},
> {view_member,
> {67,<0.283.0>},
> [],
> {70,<5383.282.0>},
> {73,<5066.282.0>}}},
> {{70,<5383.282.0>},
> {view_member,
> {70,<5383.282.0>},
> [],
> {73,<5066.282.0>},
> {67,<0.283.0>}}},
> {{73,<5066.282.0>},
> {view_member,
> {73,<5066.282.0>},
> [],
> {67,<0.283.0>},
> {70,<5383.282.0>}}}]},
> 212075,
>
> [{{67,<0.283.0>},{member,{[],[]},212075,212075}},
> {{70,<5383.282.0>},{member,{[],[]},1,1}},
> {{73,<5066.282.0>},{member,{[],[]},0,0}}],
> [<0.1271.0>],
> {[],[]},
> [],undefined,
> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
> ** Reason for termination ==
> ** {function_clause,
> [{orddict,fetch,
> [{67,<0.283.0>},
> [{{73,<5066.282.0>},
> {view_member,
> {73,<5066.282.0>},
> [{70,<5383.282.0>}],
> {73,<5066.282.0>},
> {73,<5066.282.0>}}}]],
> [{file,"orddict.erl"},{line,72}]},
> {gm,check_neighbours,1,[]},
> {gm,handle_info,2,[]},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>
> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
> ** Generic server <0.203.0> terminating
> ** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
> {vote_yes,{tid,10316,<0.203.0>}}}
> ** When Server state == 1
> ** Reason for termination ==
> ** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
> {vote_yes,{tid,10316,<0.203.0>}}}}
>
> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
> ** Generic server <0.275.0> terminating
> ** Last message in was {'DOWN',#Ref<0.0.1.38240>,process,<5383.274.0>,
> noconnection}
> ** When Server state == {state,
> {70,<0.275.0>},
> {{76,<5066.274.0>},#Ref<0.0.1.42305>},
> {{73,<5383.274.0>},#Ref<0.0.1.38240>},
> {resource,<<"/IEC">>,queue,
> <<"activity.queue.dead">>},
> rabbit_mirror_queue_coordinator,
> {77,
> [{{70,<0.275.0>},
> {view_member,
> {70,<0.275.0>},
> [],
> {76,<5066.274.0>},
> {73,<5383.274.0>}}},
> {{73,<5383.274.0>},
> {view_member,
> {73,<5383.274.0>},
> [],
> {70,<0.275.0>},
> {76,<5066.274.0>}}},
> {{76,<5066.274.0>},
> {view_member,
> {76,<5066.274.0>},
> [],
> {73,<5383.274.0>},
> {70,<0.275.0>}}}]},
> 6,
> [{{70,<0.275.0>},{member,{[],[]},6,6}},
> {{73,<5383.274.0>},{member,{[],[]},1,1}},
> {{76,<5066.274.0>},{member,{[],[]},0,0}}],
> [<0.1273.0>],
> {[],[]},
> [],undefined,
> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
> ** Reason for termination ==
> ** {noproc,{gen_server2,call,
> [<0.203.0>,
> {submit,#Fun<rabbit_misc.6.116010224>},
> infinity]}}
>
> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
> ** Generic server <0.204.0> terminating
> ** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
> {vote_yes,{tid,10315,<0.204.0>}}}
> ** When Server state == 2
> ** Reason for termination ==
> ** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
> {vote_yes,{tid,10315,<0.204.0>}}}}
>
> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
> ** Generic server <0.1268.0> terminating
> ** Last message in was {'$gen_cast',{gm_deaths,[<5066.266.0>,<0.267.0>]}}
> ** When Server state == {state,
> {amqqueue,
> {resource,<<"/IEC">>,queue,
> <<"gps.queue.dead">>},
> true,false,none,[],<0.266.0>,
> [<5066.265.0>],
> [<5066.265.0>],
> [{vhost,<<"/IEC">>},
> {name,<<"Queue HA">>},
> {pattern,<<".queue">>},
> {'apply-to',<<"queues">>},
> {definition,
> [{<<"ha-mode">>,<<"all">>},
>
> {<<"ha-sync-mode">>,<<"automatic">>}]},
> {priority,0}],
> [{<5066.266.0>,<5066.265.0>},
> {<5383.266.0>,<5383.265.0>}],
> []},
> <0.267.0>,
> {state,
> {dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],
> [],[],[]},
>
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],
> [],[],[]}}},
> erlang},
> #Fun<rabbit_mirror_queue_master.5.69128381>,
> #Fun<rabbit_mirror_queue_master.6.50493311>}
> ** Reason for termination ==
> ** {{case_clause,{ok,<5066.265.0>,[]}},
> [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>
> =ERROR REPORT==== 19-Feb-2014::18:34:31 ===
> ** Generic server <0.266.0> terminating
> ** Last message in was {'EXIT',<0.1268.0>,
> {{case_clause,{ok,<5066.265.0>,[]}},
>
> [{rabbit_mirror_queue_coordinator,handle_cast,2,
> []},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,wake_up,3,
> [{file,"proc_lib.erl"},{line,249}]}]}}
> ** When Server state == {q,
> {amqqueue,
> {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
> true,false,none,[],<0.266.0>,
> [<5383.265.0>,<5066.265.0>],
> [<5066.265.0>,<5383.265.0>],
> [{vhost,<<"/IEC">>},
> {name,<<"Queue HA">>},
> {pattern,<<".queue">>},
> {'apply-to',<<"queues">>},
> {definition,
> [{<<"ha-mode">>,<<"all">>},
> {<<"ha-sync-mode">>,<<"automatic">>}]},
> {priority,0}],
> [{<5066.266.0>,<5066.265.0>},
> {<5383.266.0>,<5383.265.0>},
> {<0.267.0>,<0.266.0>}],
> []},
> none,false,rabbit_mirror_queue_master,
> {state,
> {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
> <0.267.0>,<0.1268.0>,rabbit_variable_queue,
> {vqstate,
> {0,{[],[]}},
> {0,{[],[]}},
> {delta,undefined,0,undefined},
> {0,{[],[]}},
> {0,{[],[]}},
> 0,
> {0,nil},
> {0,nil},
> {qistate,
> "d:/tools/RabbitMQ
> Server/data/db/rabbit at FRA-VSP-32596-mnesia
> /queues/6IXYXKMC8M51EEAXH5MKLR0Q4",
> {{dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []},
>
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []}}},
> []},
> undefined,0,65536,
> #Fun<rabbit_variable_queue.2.81334491>,
> {0,nil}},
> {{client_msstate,msg_store_persistent,
>
> <<55,209,140,132,77,86,75,214,37,255,72,56,103,92,
> 154,75>>,
> {dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []},
>
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []}}},
> {state,340043,
> "d:/tools/RabbitMQ
> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent"},
> rabbit_msg_store_ets_index,
> "d:/tools/RabbitMQ
> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent",
> <0.255.0>,344140,335946,348237,352334},
> {client_msstate,msg_store_transient,
>
> <<148,176,200,245,252,25,203,27,190,186,25,104,
> 217,230,131,35>>,
> {dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []},
>
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []}}},
> {state,319558,
> "d:/tools/RabbitMQ
> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient"},
> rabbit_msg_store_ets_index,
> "d:/tools/RabbitMQ
> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient",
> <0.250.0>,323655,315461,327752,331849}},
> true,0,0,0,infinity,0,0,0,0,0,
> {rates,
> {{1392,831016,530070},0},
> {{1392,831016,530070},0},
> 0.0,0.0,
> {1392,831128,748070}},
> {0,nil},
> {0,nil},
> {0,nil},
> {0,nil},
> 0,0,
> {rates,
> {{1392,831016,530070},0},
> {{1392,831016,530070},0},
> 0.0,0.0,
> {1392,831128,748070}}},
> {dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []}}},
> [],
> {set,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []}}}},
> {queue,[],[],0},
> undefined,undefined,undefined,undefined,
> {state,fine,5000,undefined},
> {0,nil},
> undefined,undefined,undefined,
> {state,
> {dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> []}}},
> delegate},
> undefined,undefined,undefined,4,running}
> ** Reason for termination ==
> ** {{case_clause,{ok,<5066.265.0>,[]}},
> [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>
>
> Any ideas?
>
> Best regards,
>
> * ________________________________________________________________*
> *Michaël OULLION*
> *Architecte JAVA*
> *ND Informatique*
> Adresse (1208 route des Pierrelles B.P. 98 BEAUSEMBLANT - 26240
> Beausemblant - FRANCE)
> Tel. +33 (0)4 75 23 68 07
> Visit our web site at www.norbert-dentressangle.com
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140220/ba120998/attachment.html>
More information about the rabbitmq-discuss
mailing list