[rabbitmq-discuss] RabbitMQ Cluster, split network & VMWare

Bill Chmura bchmura at nurturhealth.com
Fri Feb 21 00:36:16 GMT 2014


Hi Michael,

We've been trying to track down a similar problem with our cluster of RabbitMQ on VMWare machines.  We started here a few months back and Simon gave us some things to try.  We've still not figured it out, but we really think it is something in the VMWare layer of things.

Not sure if your problem is the same, but we tried the increasing of the nettick time and all to no avail

When we have our problem, on the VMWARE console, the performance charting just goes blank for a bit.  Nothing is recorded, just a gap in the timeline

I was wondering if you see the same thing?   We've been tweaking our vmware settings, and it seems to be tied to our web app deploys.






------------------------------------------------------------------------------
Hi all,

We observe some net split on our cluster and we don't know why.
Before change the net tick parameter and change the net split behavior, I
want to understand why it's happening.
Our environment is :
RabbitMQ 3.2.1 Elrang R16B
3 RabbitMQ Node in the same sub-network
RabbitMQ is installed on Windows 2008 R2 (VMWare ESXi 5.1)
We have 4 Mirrored Queues on this cluster.
In production, the normal stream is about 20 messages/second.

We observe that split occurs always at the end of the snapshot (NetBackup)
on the VM.
But, we made snapshot each night and the network split occurs 1 time each
15 or 20 days.

*Log server rabbit at FRA-VSP-32545 :*
=INFO REPORT==== 19-Feb-2014::18:34:47 ===
rabbit on node 'rabbit at FRA-VSP-32596' down

=INFO REPORT==== 19-Feb-2014::18:34:49 ===
Mirrored-queue (queue 'conso.queue.dead' in vhost '/IEC'): Slave
<'rabbit at FRA-VSP-32545'.2.269.0> saw deaths of mirrors
<'rabbit at FRA-VSP-32596'.1.270.0>


*Log server rabbit at FRA-VSP-32596 :*

=INFO REPORT==== 19-Feb-2014::18:34:28 ===
rabbit on node 'rabbit at FRA-VSP-32545' down

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.279.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.248452>,process,<5383.278.0>,
                               noconnection}
** When Server state == {state,
                            {76,<0.279.0>},
                            {{79,<5383.278.0>},#Ref<0.0.0.248452>},
                            {{82,<5066.278.0>},#Ref<0.0.1.42330>},
                            {resource,<<"/IEC">>,queue,<<"conso.queue">>},
                            rabbit_mirror_queue_coordinator,
                            {83,
                             [{{76,<0.279.0>},
                               {view_member,
                                   {76,<0.279.0>},
                                   [],
                                   {79,<5383.278.0>},
                                   {82,<5066.278.0>}}},
                              {{79,<5383.278.0>},
                              {view_member,
                                   {79,<5383.278.0>},
                                   [],
                                   {82,<5066.278.0>},
                                   {76,<0.279.0>}}},
                              {{82,<5066.278.0>},
                               {view_member,
                                   {82,<5066.278.0>},
                                   [],
                                   {76,<0.279.0>},
                                   {79,<5383.278.0>}}}]},
                            1457518,

[{{76,<0.279.0>},{member,{[],[]},1457518,1457518}},
                             {{79,<5383.278.0>},{member,{[],[]},1,1}},
                             {{82,<5066.278.0>},{member,{[],[]},0,0}}],
                            [<0.1272.0>],
                            {[],[]},
                            [],undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {function_clause,
       [{orddict,fetch,
            [{76,<0.279.0>},
             [{{82,<5066.278.0>},
               {view_member,
                   {82,<5066.278.0>},
                   [{79,<5383.278.0>}],
                   {82,<5066.278.0>},
                   {82,<5066.278.0>}}}]],
            [{file,"orddict.erl"},{line,72}]},
        {gm,check_neighbours,1,[]},
        {gm,handle_info,2,[]},
        {gen_server2,handle_msg,2,[]},
        {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.283.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.248454>,process,<5383.282.0>,
                               noconnection}
** When Server state == {state,
                            {67,<0.283.0>},
                            {{70,<5383.282.0>},#Ref<0.0.0.248454>},
                            {{73,<5066.282.0>},#Ref<0.0.1.42352>},
                            {resource,<<"/IEC">>,queue,<<"event.queue">>},
                            rabbit_mirror_queue_coordinator,
                            {74,
                             [{{67,<0.283.0>},
                               {view_member,
                                   {67,<0.283.0>},
                                   [],
                                   {70,<5383.282.0>},
                                   {73,<5066.282.0>}}},
                              {{70,<5383.282.0>},
                               {view_member,
                                   {70,<5383.282.0>},
                                   [],
                                   {73,<5066.282.0>},
                                   {67,<0.283.0>}}},
                              {{73,<5066.282.0>},
                               {view_member,
                                   {73,<5066.282.0>},
                                   [],
                                   {67,<0.283.0>},
                                   {70,<5383.282.0>}}}]},
                            212075,

[{{67,<0.283.0>},{member,{[],[]},212075,212075}},
                             {{70,<5383.282.0>},{member,{[],[]},1,1}},
                             {{73,<5066.282.0>},{member,{[],[]},0,0}}],
                            [<0.1271.0>],
                            {[],[]},
                            [],undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {function_clause,
       [{orddict,fetch,
            [{67,<0.283.0>},
             [{{73,<5066.282.0>},
               {view_member,
                   {73,<5066.282.0>},
                   [{70,<5383.282.0>}],
                   {73,<5066.282.0>},
                   {73,<5066.282.0>}}}]],
            [{file,"orddict.erl"},{line,72}]},
        {gm,check_neighbours,1,[]},
        {gm,handle_info,2,[]},
        {gen_server2,handle_msg,2,[]},
        {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.203.0> terminating
** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
                                  {vote_yes,{tid,10316,<0.203.0>}}}
** When Server state == 1
** Reason for termination ==
** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
                               {vote_yes,{tid,10316,<0.203.0>}}}}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.275.0> terminating
** Last message in was {'DOWN',#Ref<0.0.1.38240>,process,<5383.274.0>,
                               noconnection}
** When Server state == {state,
                            {70,<0.275.0>},
                            {{76,<5066.274.0>},#Ref<0.0.1.42305>},
                            {{73,<5383.274.0>},#Ref<0.0.1.38240>},
                            {resource,<<"/IEC">>,queue,
                                <<"activity.queue.dead">>},
                            rabbit_mirror_queue_coordinator,
                            {77,
                             [{{70,<0.275.0>},
                               {view_member,
                                   {70,<0.275.0>},
                                   [],
                                   {76,<5066.274.0>},
                                   {73,<5383.274.0>}}},
                              {{73,<5383.274.0>},
                               {view_member,
                                   {73,<5383.274.0>},
                                   [],
                                   {70,<0.275.0>},
                                   {76,<5066.274.0>}}},
                              {{76,<5066.274.0>},
                               {view_member,
                                   {76,<5066.274.0>},
                                   [],
                                   {73,<5383.274.0>},
                                   {70,<0.275.0>}}}]},
                            6,
                            [{{70,<0.275.0>},{member,{[],[]},6,6}},
                             {{73,<5383.274.0>},{member,{[],[]},1,1}},
                             {{76,<5066.274.0>},{member,{[],[]},0,0}}],
                            [<0.1273.0>],
                            {[],[]},
                            [],undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {noproc,{gen_server2,call,
                        [<0.203.0>,
                         {submit,#Fun<rabbit_misc.6.116010224>},
                         infinity]}}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.204.0> terminating
** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
                                  {vote_yes,{tid,10315,<0.204.0>}}}
** When Server state == 2
** Reason for termination ==
** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
                               {vote_yes,{tid,10315,<0.204.0>}}}}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.1268.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<5066.266.0>,<0.267.0>]}}
** When Server state == {state,
                            {amqqueue,
                                {resource,<<"/IEC">>,queue,
                                    <<"gps.queue.dead">>},
                                true,false,none,[],<0.266.0>,
                                [<5066.265.0>],
                                [<5066.265.0>],
                                [{vhost,<<"/IEC">>},
                                 {name,<<"Queue HA">>},
                                 {pattern,<<".queue">>},
                                 {'apply-to',<<"queues">>},
                                 {definition,
                                     [{<<"ha-mode">>,<<"all">>},

{<<"ha-sync-mode">>,<<"automatic">>}]},
                                 {priority,0}],
                                [{<5066.266.0>,<5066.265.0>},
                                 {<5383.266.0>,<5383.265.0>}],
                                []},
                            <0.267.0>,
                            {state,
                                {dict,0,16,16,8,80,48,
                                    {[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                      [],[],[]}}},
                                erlang},
                            #Fun<rabbit_mirror_queue_master.5.69128381>,
                            #Fun<rabbit_mirror_queue_master.6.50493311>}
** Reason for termination ==
** {{case_clause,{ok,<5066.265.0>,[]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
     {gen_server2,handle_msg,2,[]},
     {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}

=ERROR REPORT==== 19-Feb-2014::18:34:31 ===
** Generic server <0.266.0> terminating
** Last message in was {'EXIT',<0.1268.0>,
                           {{case_clause,{ok,<5066.265.0>,[]}},
                            [{rabbit_mirror_queue_coordinator,handle_cast,2,
                                 []},
                             {gen_server2,handle_msg,2,[]},
                             {proc_lib,wake_up,3,
                                 [{file,"proc_lib.erl"},{line,249}]}]}}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
                          true,false,none,[],<0.266.0>,
                          [<5383.265.0>,<5066.265.0>],
                          [<5066.265.0>,<5383.265.0>],
                          [{vhost,<<"/IEC">>},
                           {name,<<"Queue HA">>},
                           {pattern,<<".queue">>},
                           {'apply-to',<<"queues">>},
                           {definition,
                            [{<<"ha-mode">>,<<"all">>},
                             {<<"ha-sync-mode">>,<<"automatic">>}]},
                           {priority,0}],
                          [{<5066.266.0>,<5066.265.0>},
                           {<5383.266.0>,<5383.265.0>},
                           {<0.267.0>,<0.266.0>}],
                          []},
                         none,false,rabbit_mirror_queue_master,
                         {state,
                          {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
                          <0.267.0>,<0.1268.0>,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {0,{[],[]}},
                           0,
                           {0,nil},
                           {0,nil},
                           {qistate,
                            "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia
/queues/6IXYXKMC8M51EEAXH5MKLR0Q4",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,65536,
                            #Fun<rabbit_variable_queue.2.81334491>,
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,

<<55,209,140,132,77,86,75,214,37,255,72,56,103,92,
                               154,75>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,340043,
                              "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent"},
                             rabbit_msg_store_ets_index,
                             "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent",
                             <0.255.0>,344140,335946,348237,352334},
                            {client_msstate,msg_store_transient,
                             <<148,176,200,245,252,25,203,27,190,186,25,104,
                               217,230,131,35>>,
                             {dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,319558,
                              "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient",
                             <0.250.0>,323655,315461,327752,331849}},
                           true,0,0,0,infinity,0,0,0,0,0,
                           {rates,
                            {{1392,831016,530070},0},
                            {{1392,831016,530070},0},
                            0.0,0.0,
                            {1392,831128,748070}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,
                           {rates,
                            {{1392,831016,530070},0},
                            {{1392,831016,530070},0},
                            0.0,0.0,
                            {1392,831128,748070}}},
                          {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          [],
                          {set,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}}},
                         {queue,[],[],0},
                         undefined,undefined,undefined,undefined,
                         {state,fine,5000,undefined},
                         {0,nil},
                         undefined,undefined,undefined,
                         {state,
                          {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          delegate},
                         undefined,undefined,undefined,4,running}
** Reason for termination ==
** {{case_clause,{ok,<5066.265.0>,[]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
     {gen_server2,handle_msg,2,[]},
    {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}


Any ideas?

Best regards,

* ________________________________________________________________*
*Micha?l OULLION*
*Architecte JAVA*
*ND Informatique*
Adresse (1208 route des Pierrelles B.P. 98 BEAUSEMBLANT - 26240
Beausemblant - FRANCE)
Tel. +33 (0)4 75 23 68 07
Visit our web site at www.norbert-dentressangle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140220/4a2828a3/attachment-0001.html>

------------------------------

Subject: Digest Footer




Bill Chmura    Director, IT Development Services

direct 860 676 3618  |  toll-free 800 293 0056  x61009

Nurtur  |  20 Batterson Park Road  |  Farmington, CT  06032
bchmura at nurturhealth.com<mailto:bchmura at nurturhealth.com>  |  www.nurturhealth.com<http://www.nurturhealth.com/>

Let's Talk Blog<http://www.letstalkhealthcarereform.com/>  |  Journey Newsletter  <http://www.nurturhealth.com/journey/>  |  Twitter<http://twitter.com/#!/nurturhealth>  |  LinkedIn<http://www.linkedin.com/company/nurtur>
P Think Green. Please consider the environment before printing this email.



This email and all attachments are confidential and intended solely 
for the use of the individual or entity to which they are addressed. 
If you have received this email in error please notify the sender 
by replying to this message. If you are not the intended recipient, 
please delete this message and all attachments immediately.  Do not 
copy, disclose, use or act upon the information contained. Please 
note that any views or opinions presented in this email are solely 
those of the author and do not necessarily represent those of the 
company. Finally, the recipient should check this email and any 
attachments for the presence of viruses. While every attempt is made 
to verify that the contents are safe, the company accepts no liability 
for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140221/171f9bc7/attachment.html>


More information about the rabbitmq-discuss mailing list