[rabbitmq-discuss] RabbitMQ Cluster, split network & VMWare snapshot
Michael Oullion
michael.oullion at norbert-dentressangle.com
Thu Feb 20 20:11:43 GMT 2014
Thanks Jerry for your quick answer.
What can we do in this situation?
Maybe we can uprise the net tick or use a specific behaviour to manage
network split.
Or simply stop take snapshot of the vm because it's not necessary?
Regards.
Le 20 févr. 2014 17:57, "Jerry Kuch" <jkuch at gopivotal.com> a écrit :
> Hi, Michael:
>
> This isn't terribly surprising. Snapshotting a VM is likely to render it
> less responsive than it normally would be for some amount of time. Whether
> that period ends before some other node in your cluster misses heartbeats
> and gives you grief about it, is a coin flip.
>
> Best regards,
> Jerry
>
>
>
> On Thu, Feb 20, 2014 at 3:12 AM, Michael Oullion <
> michael.oullion at norbert-dentressangle.com> wrote:
>
>> Hi all,
>>
>> We observe some net split on our cluster and we don't know why.
>> Before change the net tick parameter and change the net split behavior, I
>> want to understand why it's happening.
>> Our environment is :
>> RabbitMQ 3.2.1 Elrang R16B
>> 3 RabbitMQ Node in the same sub-network
>> RabbitMQ is installed on Windows 2008 R2 (VMWare ESXi 5.1)
>> We have 4 Mirrored Queues on this cluster.
>> In production, the normal stream is about 20 messages/second.
>>
>> We observe that split occurs always at the end of the snapshot
>> (NetBackup) on the VM.
>> But, we made snapshot each night and the network split occurs 1 time each
>> 15 or 20 days.
>>
>> *Log server rabbit at FRA-VSP-32545 :*
>> =INFO REPORT==== 19-Feb-2014::18:34:47 ===
>> rabbit on node 'rabbit at FRA-VSP-32596' down
>>
>> =INFO REPORT==== 19-Feb-2014::18:34:49 ===
>> Mirrored-queue (queue 'conso.queue.dead' in vhost '/IEC'): Slave
>> <'rabbit at FRA-VSP-32545'.2.269.0> saw deaths of mirrors
>> <'rabbit at FRA-VSP-32596'.1.270.0>
>>
>>
>> *Log server rabbit at FRA-VSP-32596 :*
>>
>> =INFO REPORT==== 19-Feb-2014::18:34:28 ===
>> rabbit on node 'rabbit at FRA-VSP-32545' down
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.279.0> terminating
>> ** Last message in was {'DOWN',#Ref<0.0.0.248452>,process,<5383.278.0>,
>> noconnection}
>> ** When Server state == {state,
>> {76,<0.279.0>},
>> {{79,<5383.278.0>},#Ref<0.0.0.248452>},
>> {{82,<5066.278.0>},#Ref<0.0.1.42330>},
>> {resource,<<"/IEC">>,queue,<<"conso.queue">>},
>> rabbit_mirror_queue_coordinator,
>> {83,
>> [{{76,<0.279.0>},
>> {view_member,
>> {76,<0.279.0>},
>> [],
>> {79,<5383.278.0>},
>> {82,<5066.278.0>}}},
>> {{79,<5383.278.0>},
>> {view_member,
>> {79,<5383.278.0>},
>> [],
>> {82,<5066.278.0>},
>> {76,<0.279.0>}}},
>> {{82,<5066.278.0>},
>> {view_member,
>> {82,<5066.278.0>},
>> [],
>> {76,<0.279.0>},
>> {79,<5383.278.0>}}}]},
>> 1457518,
>>
>> [{{76,<0.279.0>},{member,{[],[]},1457518,1457518}},
>> {{79,<5383.278.0>},{member,{[],[]},1,1}},
>> {{82,<5066.278.0>},{member,{[],[]},0,0}}],
>> [<0.1272.0>],
>> {[],[]},
>> [],undefined,
>>
>> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
>> ** Reason for termination ==
>> ** {function_clause,
>> [{orddict,fetch,
>> [{76,<0.279.0>},
>> [{{82,<5066.278.0>},
>> {view_member,
>> {82,<5066.278.0>},
>> [{79,<5383.278.0>}],
>> {82,<5066.278.0>},
>> {82,<5066.278.0>}}}]],
>> [{file,"orddict.erl"},{line,72}]},
>> {gm,check_neighbours,1,[]},
>> {gm,handle_info,2,[]},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.283.0> terminating
>> ** Last message in was {'DOWN',#Ref<0.0.0.248454>,process,<5383.282.0>,
>> noconnection}
>> ** When Server state == {state,
>> {67,<0.283.0>},
>> {{70,<5383.282.0>},#Ref<0.0.0.248454>},
>> {{73,<5066.282.0>},#Ref<0.0.1.42352>},
>> {resource,<<"/IEC">>,queue,<<"event.queue">>},
>> rabbit_mirror_queue_coordinator,
>> {74,
>> [{{67,<0.283.0>},
>> {view_member,
>> {67,<0.283.0>},
>> [],
>> {70,<5383.282.0>},
>> {73,<5066.282.0>}}},
>> {{70,<5383.282.0>},
>> {view_member,
>> {70,<5383.282.0>},
>> [],
>> {73,<5066.282.0>},
>> {67,<0.283.0>}}},
>> {{73,<5066.282.0>},
>> {view_member,
>> {73,<5066.282.0>},
>> [],
>> {67,<0.283.0>},
>> {70,<5383.282.0>}}}]},
>> 212075,
>>
>> [{{67,<0.283.0>},{member,{[],[]},212075,212075}},
>> {{70,<5383.282.0>},{member,{[],[]},1,1}},
>> {{73,<5066.282.0>},{member,{[],[]},0,0}}],
>> [<0.1271.0>],
>> {[],[]},
>> [],undefined,
>>
>> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
>> ** Reason for termination ==
>> ** {function_clause,
>> [{orddict,fetch,
>> [{67,<0.283.0>},
>> [{{73,<5066.282.0>},
>> {view_member,
>> {73,<5066.282.0>},
>> [{70,<5383.282.0>}],
>> {73,<5066.282.0>},
>> {73,<5066.282.0>}}}]],
>> [{file,"orddict.erl"},{line,72}]},
>> {gm,check_neighbours,1,[]},
>> {gm,handle_info,2,[]},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.203.0> terminating
>> ** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
>> {vote_yes,{tid,10316,<0.203.0>}}}
>> ** When Server state == 1
>> ** Reason for termination ==
>> ** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
>> {vote_yes,{tid,10316,<0.203.0>}}}}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.275.0> terminating
>> ** Last message in was {'DOWN',#Ref<0.0.1.38240>,process,<5383.274.0>,
>> noconnection}
>> ** When Server state == {state,
>> {70,<0.275.0>},
>> {{76,<5066.274.0>},#Ref<0.0.1.42305>},
>> {{73,<5383.274.0>},#Ref<0.0.1.38240>},
>> {resource,<<"/IEC">>,queue,
>> <<"activity.queue.dead">>},
>> rabbit_mirror_queue_coordinator,
>> {77,
>> [{{70,<0.275.0>},
>> {view_member,
>> {70,<0.275.0>},
>> [],
>> {76,<5066.274.0>},
>> {73,<5383.274.0>}}},
>> {{73,<5383.274.0>},
>> {view_member,
>> {73,<5383.274.0>},
>> [],
>> {70,<0.275.0>},
>> {76,<5066.274.0>}}},
>> {{76,<5066.274.0>},
>> {view_member,
>> {76,<5066.274.0>},
>> [],
>> {73,<5383.274.0>},
>> {70,<0.275.0>}}}]},
>> 6,
>> [{{70,<0.275.0>},{member,{[],[]},6,6}},
>> {{73,<5383.274.0>},{member,{[],[]},1,1}},
>> {{76,<5066.274.0>},{member,{[],[]},0,0}}],
>> [<0.1273.0>],
>> {[],[]},
>> [],undefined,
>>
>> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
>> ** Reason for termination ==
>> ** {noproc,{gen_server2,call,
>> [<0.203.0>,
>> {submit,#Fun<rabbit_misc.6.116010224>},
>> infinity]}}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.204.0> terminating
>> ** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
>> {vote_yes,{tid,10315,<0.204.0>}}}
>> ** When Server state == 2
>> ** Reason for termination ==
>> ** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
>> {vote_yes,{tid,10315,<0.204.0>}}}}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.1268.0> terminating
>> ** Last message in was {'$gen_cast',{gm_deaths,[<5066.266.0>,<0.267.0>]}}
>> ** When Server state == {state,
>> {amqqueue,
>> {resource,<<"/IEC">>,queue,
>> <<"gps.queue.dead">>},
>> true,false,none,[],<0.266.0>,
>> [<5066.265.0>],
>> [<5066.265.0>],
>> [{vhost,<<"/IEC">>},
>> {name,<<"Queue HA">>},
>> {pattern,<<".queue">>},
>> {'apply-to',<<"queues">>},
>> {definition,
>> [{<<"ha-mode">>,<<"all">>},
>>
>> {<<"ha-sync-mode">>,<<"automatic">>}]},
>> {priority,0}],
>> [{<5066.266.0>,<5066.265.0>},
>> {<5383.266.0>,<5383.265.0>}],
>> []},
>> <0.267.0>,
>> {state,
>> {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],
>> [],[],[]},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],
>> [],[],[]}}},
>> erlang},
>> #Fun<rabbit_mirror_queue_master.5.69128381>,
>> #Fun<rabbit_mirror_queue_master.6.50493311>}
>> ** Reason for termination ==
>> ** {{case_clause,{ok,<5066.265.0>,[]}},
>> [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:31 ===
>> ** Generic server <0.266.0> terminating
>> ** Last message in was {'EXIT',<0.1268.0>,
>> {{case_clause,{ok,<5066.265.0>,[]}},
>>
>> [{rabbit_mirror_queue_coordinator,handle_cast,2,
>> []},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,wake_up,3,
>> [{file,"proc_lib.erl"},{line,249}]}]}}
>> ** When Server state == {q,
>> {amqqueue,
>>
>> {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
>> true,false,none,[],<0.266.0>,
>> [<5383.265.0>,<5066.265.0>],
>> [<5066.265.0>,<5383.265.0>],
>> [{vhost,<<"/IEC">>},
>> {name,<<"Queue HA">>},
>> {pattern,<<".queue">>},
>> {'apply-to',<<"queues">>},
>> {definition,
>> [{<<"ha-mode">>,<<"all">>},
>> {<<"ha-sync-mode">>,<<"automatic">>}]},
>> {priority,0}],
>> [{<5066.266.0>,<5066.265.0>},
>> {<5383.266.0>,<5383.265.0>},
>> {<0.267.0>,<0.266.0>}],
>> []},
>> none,false,rabbit_mirror_queue_master,
>> {state,
>>
>> {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
>> <0.267.0>,<0.1268.0>,rabbit_variable_queue,
>> {vqstate,
>> {0,{[],[]}},
>> {0,{[],[]}},
>> {delta,undefined,0,undefined},
>> {0,{[],[]}},
>> {0,{[],[]}},
>> 0,
>> {0,nil},
>> {0,nil},
>> {qistate,
>> "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia
>> /queues/6IXYXKMC8M51EEAXH5MKLR0Q4",
>> {{dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []}}},
>> []},
>> undefined,0,65536,
>> #Fun<rabbit_variable_queue.2.81334491>,
>> {0,nil}},
>> {{client_msstate,msg_store_persistent,
>>
>> <<55,209,140,132,77,86,75,214,37,255,72,56,103,92,
>> 154,75>>,
>> {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []}}},
>> {state,340043,
>> "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent"},
>> rabbit_msg_store_ets_index,
>> "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent",
>> <0.255.0>,344140,335946,348237,352334},
>> {client_msstate,msg_store_transient,
>>
>> <<148,176,200,245,252,25,203,27,190,186,25,104,
>> 217,230,131,35>>,
>> {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []}}},
>> {state,319558,
>> "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient"},
>> rabbit_msg_store_ets_index,
>> "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient",
>> <0.250.0>,323655,315461,327752,331849}},
>> true,0,0,0,infinity,0,0,0,0,0,
>> {rates,
>> {{1392,831016,530070},0},
>> {{1392,831016,530070},0},
>> 0.0,0.0,
>> {1392,831128,748070}},
>> {0,nil},
>> {0,nil},
>> {0,nil},
>> {0,nil},
>> 0,0,
>> {rates,
>> {{1392,831016,530070},0},
>> {{1392,831016,530070},0},
>> 0.0,0.0,
>> {1392,831128,748070}}},
>> {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []}}},
>> [],
>> {set,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []}}}},
>> {queue,[],[],0},
>> undefined,undefined,undefined,undefined,
>> {state,fine,5000,undefined},
>> {0,nil},
>> undefined,undefined,undefined,
>> {state,
>> {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> []}}},
>> delegate},
>> undefined,undefined,undefined,4,running}
>> ** Reason for termination ==
>> ** {{case_clause,{ok,<5066.265.0>,[]}},
>> [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>>
>> Any ideas?
>>
>> Best regards,
>>
>> * ________________________________________________________________*
>> *Michaël OULLION*
>> *Architecte JAVA*
>> *ND Informatique*
>> Adresse (1208 route des Pierrelles B.P. 98 BEAUSEMBLANT - 26240
>> Beausemblant - FRANCE)
>> Tel. +33 (0)4 75 23 68 07
>> Visit our web site at www.norbert-dentressangle.com
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140220/172966b8/attachment.html>
More information about the rabbitmq-discuss
mailing list