[rabbitmq-discuss] RabbitMQ Cluster, split network & VMWare
Bill Chmura
bchmura at nurturhealth.com
Fri Feb 21 00:36:16 GMT 2014
Hi Michael,
We've been trying to track down a similar problem with our cluster of RabbitMQ on VMWare machines. We started here a few months back and Simon gave us some things to try. We've still not figured it out, but we really think it is something in the VMWare layer of things.
Not sure if your problem is the same, but we tried the increasing of the nettick time and all to no avail
When we have our problem, on the VMWARE console, the performance charting just goes blank for a bit. Nothing is recorded, just a gap in the timeline
I was wondering if you see the same thing? We've been tweaking our vmware settings, and it seems to be tied to our web app deploys.
------------------------------------------------------------------------------
Hi all,
We observe some net split on our cluster and we don't know why.
Before change the net tick parameter and change the net split behavior, I
want to understand why it's happening.
Our environment is :
RabbitMQ 3.2.1 Elrang R16B
3 RabbitMQ Node in the same sub-network
RabbitMQ is installed on Windows 2008 R2 (VMWare ESXi 5.1)
We have 4 Mirrored Queues on this cluster.
In production, the normal stream is about 20 messages/second.
We observe that split occurs always at the end of the snapshot (NetBackup)
on the VM.
But, we made snapshot each night and the network split occurs 1 time each
15 or 20 days.
*Log server rabbit at FRA-VSP-32545 :*
=INFO REPORT==== 19-Feb-2014::18:34:47 ===
rabbit on node 'rabbit at FRA-VSP-32596' down
=INFO REPORT==== 19-Feb-2014::18:34:49 ===
Mirrored-queue (queue 'conso.queue.dead' in vhost '/IEC'): Slave
<'rabbit at FRA-VSP-32545'.2.269.0> saw deaths of mirrors
<'rabbit at FRA-VSP-32596'.1.270.0>
*Log server rabbit at FRA-VSP-32596 :*
=INFO REPORT==== 19-Feb-2014::18:34:28 ===
rabbit on node 'rabbit at FRA-VSP-32545' down
=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.279.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.248452>,process,<5383.278.0>,
noconnection}
** When Server state == {state,
{76,<0.279.0>},
{{79,<5383.278.0>},#Ref<0.0.0.248452>},
{{82,<5066.278.0>},#Ref<0.0.1.42330>},
{resource,<<"/IEC">>,queue,<<"conso.queue">>},
rabbit_mirror_queue_coordinator,
{83,
[{{76,<0.279.0>},
{view_member,
{76,<0.279.0>},
[],
{79,<5383.278.0>},
{82,<5066.278.0>}}},
{{79,<5383.278.0>},
{view_member,
{79,<5383.278.0>},
[],
{82,<5066.278.0>},
{76,<0.279.0>}}},
{{82,<5066.278.0>},
{view_member,
{82,<5066.278.0>},
[],
{76,<0.279.0>},
{79,<5383.278.0>}}}]},
1457518,
[{{76,<0.279.0>},{member,{[],[]},1457518,1457518}},
{{79,<5383.278.0>},{member,{[],[]},1,1}},
{{82,<5066.278.0>},{member,{[],[]},0,0}}],
[<0.1272.0>],
{[],[]},
[],undefined,
#Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {function_clause,
[{orddict,fetch,
[{76,<0.279.0>},
[{{82,<5066.278.0>},
{view_member,
{82,<5066.278.0>},
[{79,<5383.278.0>}],
{82,<5066.278.0>},
{82,<5066.278.0>}}}]],
[{file,"orddict.erl"},{line,72}]},
{gm,check_neighbours,1,[]},
{gm,handle_info,2,[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.283.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.248454>,process,<5383.282.0>,
noconnection}
** When Server state == {state,
{67,<0.283.0>},
{{70,<5383.282.0>},#Ref<0.0.0.248454>},
{{73,<5066.282.0>},#Ref<0.0.1.42352>},
{resource,<<"/IEC">>,queue,<<"event.queue">>},
rabbit_mirror_queue_coordinator,
{74,
[{{67,<0.283.0>},
{view_member,
{67,<0.283.0>},
[],
{70,<5383.282.0>},
{73,<5066.282.0>}}},
{{70,<5383.282.0>},
{view_member,
{70,<5383.282.0>},
[],
{73,<5066.282.0>},
{67,<0.283.0>}}},
{{73,<5066.282.0>},
{view_member,
{73,<5066.282.0>},
[],
{67,<0.283.0>},
{70,<5383.282.0>}}}]},
212075,
[{{67,<0.283.0>},{member,{[],[]},212075,212075}},
{{70,<5383.282.0>},{member,{[],[]},1,1}},
{{73,<5066.282.0>},{member,{[],[]},0,0}}],
[<0.1271.0>],
{[],[]},
[],undefined,
#Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {function_clause,
[{orddict,fetch,
[{67,<0.283.0>},
[{{73,<5066.282.0>},
{view_member,
{73,<5066.282.0>},
[{70,<5383.282.0>}],
{73,<5066.282.0>},
{73,<5066.282.0>}}}]],
[{file,"orddict.erl"},{line,72}]},
{gm,check_neighbours,1,[]},
{gm,handle_info,2,[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.203.0> terminating
** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
{vote_yes,{tid,10316,<0.203.0>}}}
** When Server state == 1
** Reason for termination ==
** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
{vote_yes,{tid,10316,<0.203.0>}}}}
=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.275.0> terminating
** Last message in was {'DOWN',#Ref<0.0.1.38240>,process,<5383.274.0>,
noconnection}
** When Server state == {state,
{70,<0.275.0>},
{{76,<5066.274.0>},#Ref<0.0.1.42305>},
{{73,<5383.274.0>},#Ref<0.0.1.38240>},
{resource,<<"/IEC">>,queue,
<<"activity.queue.dead">>},
rabbit_mirror_queue_coordinator,
{77,
[{{70,<0.275.0>},
{view_member,
{70,<0.275.0>},
[],
{76,<5066.274.0>},
{73,<5383.274.0>}}},
{{73,<5383.274.0>},
{view_member,
{73,<5383.274.0>},
[],
{70,<0.275.0>},
{76,<5066.274.0>}}},
{{76,<5066.274.0>},
{view_member,
{76,<5066.274.0>},
[],
{73,<5383.274.0>},
{70,<0.275.0>}}}]},
6,
[{{70,<0.275.0>},{member,{[],[]},6,6}},
{{73,<5383.274.0>},{member,{[],[]},1,1}},
{{76,<5066.274.0>},{member,{[],[]},0,0}}],
[<0.1273.0>],
{[],[]},
[],undefined,
#Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {noproc,{gen_server2,call,
[<0.203.0>,
{submit,#Fun<rabbit_misc.6.116010224>},
infinity]}}
=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.204.0> terminating
** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
{vote_yes,{tid,10315,<0.204.0>}}}
** When Server state == 2
** Reason for termination ==
** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
{vote_yes,{tid,10315,<0.204.0>}}}}
=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.1268.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<5066.266.0>,<0.267.0>]}}
** When Server state == {state,
{amqqueue,
{resource,<<"/IEC">>,queue,
<<"gps.queue.dead">>},
true,false,none,[],<0.266.0>,
[<5066.265.0>],
[<5066.265.0>],
[{vhost,<<"/IEC">>},
{name,<<"Queue HA">>},
{pattern,<<".queue">>},
{'apply-to',<<"queues">>},
{definition,
[{<<"ha-mode">>,<<"all">>},
{<<"ha-sync-mode">>,<<"automatic">>}]},
{priority,0}],
[{<5066.266.0>,<5066.265.0>},
{<5383.266.0>,<5383.265.0>}],
[]},
<0.267.0>,
{state,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[]}}},
erlang},
#Fun<rabbit_mirror_queue_master.5.69128381>,
#Fun<rabbit_mirror_queue_master.6.50493311>}
** Reason for termination ==
** {{case_clause,{ok,<5066.265.0>,[]}},
[{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
=ERROR REPORT==== 19-Feb-2014::18:34:31 ===
** Generic server <0.266.0> terminating
** Last message in was {'EXIT',<0.1268.0>,
{{case_clause,{ok,<5066.265.0>,[]}},
[{rabbit_mirror_queue_coordinator,handle_cast,2,
[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,
[{file,"proc_lib.erl"},{line,249}]}]}}
** When Server state == {q,
{amqqueue,
{resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
true,false,none,[],<0.266.0>,
[<5383.265.0>,<5066.265.0>],
[<5066.265.0>,<5383.265.0>],
[{vhost,<<"/IEC">>},
{name,<<"Queue HA">>},
{pattern,<<".queue">>},
{'apply-to',<<"queues">>},
{definition,
[{<<"ha-mode">>,<<"all">>},
{<<"ha-sync-mode">>,<<"automatic">>}]},
{priority,0}],
[{<5066.266.0>,<5066.265.0>},
{<5383.266.0>,<5383.265.0>},
{<0.267.0>,<0.266.0>}],
[]},
none,false,rabbit_mirror_queue_master,
{state,
{resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
<0.267.0>,<0.1268.0>,rabbit_variable_queue,
{vqstate,
{0,{[],[]}},
{0,{[],[]}},
{delta,undefined,0,undefined},
{0,{[],[]}},
{0,{[],[]}},
0,
{0,nil},
{0,nil},
{qistate,
"d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia
/queues/6IXYXKMC8M51EEAXH5MKLR0Q4",
{{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
[]},
undefined,0,65536,
#Fun<rabbit_variable_queue.2.81334491>,
{0,nil}},
{{client_msstate,msg_store_persistent,
<<55,209,140,132,77,86,75,214,37,255,72,56,103,92,
154,75>>,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
{state,340043,
"d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent"},
rabbit_msg_store_ets_index,
"d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent",
<0.255.0>,344140,335946,348237,352334},
{client_msstate,msg_store_transient,
<<148,176,200,245,252,25,203,27,190,186,25,104,
217,230,131,35>>,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
{state,319558,
"d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient"},
rabbit_msg_store_ets_index,
"d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient",
<0.250.0>,323655,315461,327752,331849}},
true,0,0,0,infinity,0,0,0,0,0,
{rates,
{{1392,831016,530070},0},
{{1392,831016,530070},0},
0.0,0.0,
{1392,831128,748070}},
{0,nil},
{0,nil},
{0,nil},
{0,nil},
0,0,
{rates,
{{1392,831016,530070},0},
{{1392,831016,530070},0},
0.0,0.0,
{1392,831128,748070}}},
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
[],
{set,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}}},
{queue,[],[],0},
undefined,undefined,undefined,undefined,
{state,fine,5000,undefined},
{0,nil},
undefined,undefined,undefined,
{state,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
delegate},
undefined,undefined,undefined,4,running}
** Reason for termination ==
** {{case_clause,{ok,<5066.265.0>,[]}},
[{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
Any ideas?
Best regards,
* ________________________________________________________________*
*Micha?l OULLION*
*Architecte JAVA*
*ND Informatique*
Adresse (1208 route des Pierrelles B.P. 98 BEAUSEMBLANT - 26240
Beausemblant - FRANCE)
Tel. +33 (0)4 75 23 68 07
Visit our web site at www.norbert-dentressangle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140220/4a2828a3/attachment-0001.html>
------------------------------
Subject: Digest Footer
Bill Chmura Director, IT Development Services
direct 860 676 3618 | toll-free 800 293 0056 x61009
Nurtur | 20 Batterson Park Road | Farmington, CT 06032
bchmura at nurturhealth.com<mailto:bchmura at nurturhealth.com> | www.nurturhealth.com<http://www.nurturhealth.com/>
Let's Talk Blog<http://www.letstalkhealthcarereform.com/> | Journey Newsletter <http://www.nurturhealth.com/journey/> | Twitter<http://twitter.com/#!/nurturhealth> | LinkedIn<http://www.linkedin.com/company/nurtur>
P Think Green. Please consider the environment before printing this email.
This email and all attachments are confidential and intended solely
for the use of the individual or entity to which they are addressed.
If you have received this email in error please notify the sender
by replying to this message. If you are not the intended recipient,
please delete this message and all attachments immediately. Do not
copy, disclose, use or act upon the information contained. Please
note that any views or opinions presented in this email are solely
those of the author and do not necessarily represent those of the
company. Finally, the recipient should check this email and any
attachments for the presence of viruses. While every attempt is made
to verify that the contents are safe, the company accepts no liability
for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140221/171f9bc7/attachment.html>
More information about the rabbitmq-discuss
mailing list