[rabbitmq-discuss] RabbitMQ Cluster, split network & VMWare snapshot

Michael Oullion michael.oullion at norbert-dentressangle.com
Thu Feb 20 11:12:03 GMT 2014


Hi all,

We observe some net split on our cluster and we don't know why.
Before change the net tick parameter and change the net split behavior, I
want to understand why it's happening.
Our environment is :
RabbitMQ 3.2.1 Elrang R16B
3 RabbitMQ Node in the same sub-network
RabbitMQ is installed on Windows 2008 R2 (VMWare ESXi 5.1)
We have 4 Mirrored Queues on this cluster.
In production, the normal stream is about 20 messages/second.

We observe that split occurs always at the end of the snapshot (NetBackup)
on the VM.
But, we made snapshot each night and the network split occurs 1 time each
15 or 20 days.

*Log server rabbit at FRA-VSP-32545 :*
=INFO REPORT==== 19-Feb-2014::18:34:47 ===
rabbit on node 'rabbit at FRA-VSP-32596' down

=INFO REPORT==== 19-Feb-2014::18:34:49 ===
Mirrored-queue (queue 'conso.queue.dead' in vhost '/IEC'): Slave
<'rabbit at FRA-VSP-32545'.2.269.0> saw deaths of mirrors
<'rabbit at FRA-VSP-32596'.1.270.0>


*Log server rabbit at FRA-VSP-32596 :*

=INFO REPORT==== 19-Feb-2014::18:34:28 ===
rabbit on node 'rabbit at FRA-VSP-32545' down

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.279.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.248452>,process,<5383.278.0>,
                               noconnection}
** When Server state == {state,
                            {76,<0.279.0>},
                            {{79,<5383.278.0>},#Ref<0.0.0.248452>},
                            {{82,<5066.278.0>},#Ref<0.0.1.42330>},
                            {resource,<<"/IEC">>,queue,<<"conso.queue">>},
                            rabbit_mirror_queue_coordinator,
                            {83,
                             [{{76,<0.279.0>},
                               {view_member,
                                   {76,<0.279.0>},
                                   [],
                                   {79,<5383.278.0>},
                                   {82,<5066.278.0>}}},
                              {{79,<5383.278.0>},
                               {view_member,
                                   {79,<5383.278.0>},
                                   [],
                                   {82,<5066.278.0>},
                                   {76,<0.279.0>}}},
                              {{82,<5066.278.0>},
                               {view_member,
                                   {82,<5066.278.0>},
                                   [],
                                   {76,<0.279.0>},
                                   {79,<5383.278.0>}}}]},
                            1457518,

[{{76,<0.279.0>},{member,{[],[]},1457518,1457518}},
                             {{79,<5383.278.0>},{member,{[],[]},1,1}},
                             {{82,<5066.278.0>},{member,{[],[]},0,0}}],
                            [<0.1272.0>],
                            {[],[]},
                            [],undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {function_clause,
       [{orddict,fetch,
            [{76,<0.279.0>},
             [{{82,<5066.278.0>},
               {view_member,
                   {82,<5066.278.0>},
                   [{79,<5383.278.0>}],
                   {82,<5066.278.0>},
                   {82,<5066.278.0>}}}]],
            [{file,"orddict.erl"},{line,72}]},
        {gm,check_neighbours,1,[]},
        {gm,handle_info,2,[]},
        {gen_server2,handle_msg,2,[]},
        {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.283.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.248454>,process,<5383.282.0>,
                               noconnection}
** When Server state == {state,
                            {67,<0.283.0>},
                            {{70,<5383.282.0>},#Ref<0.0.0.248454>},
                            {{73,<5066.282.0>},#Ref<0.0.1.42352>},
                            {resource,<<"/IEC">>,queue,<<"event.queue">>},
                            rabbit_mirror_queue_coordinator,
                            {74,
                             [{{67,<0.283.0>},
                               {view_member,
                                   {67,<0.283.0>},
                                   [],
                                   {70,<5383.282.0>},
                                   {73,<5066.282.0>}}},
                              {{70,<5383.282.0>},
                               {view_member,
                                   {70,<5383.282.0>},
                                   [],
                                   {73,<5066.282.0>},
                                   {67,<0.283.0>}}},
                              {{73,<5066.282.0>},
                               {view_member,
                                   {73,<5066.282.0>},
                                   [],
                                   {67,<0.283.0>},
                                   {70,<5383.282.0>}}}]},
                            212075,

[{{67,<0.283.0>},{member,{[],[]},212075,212075}},
                             {{70,<5383.282.0>},{member,{[],[]},1,1}},
                             {{73,<5066.282.0>},{member,{[],[]},0,0}}],
                            [<0.1271.0>],
                            {[],[]},
                            [],undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {function_clause,
       [{orddict,fetch,
            [{67,<0.283.0>},
             [{{73,<5066.282.0>},
               {view_member,
                   {73,<5066.282.0>},
                   [{70,<5383.282.0>}],
                   {73,<5066.282.0>},
                   {73,<5066.282.0>}}}]],
            [{file,"orddict.erl"},{line,72}]},
        {gm,check_neighbours,1,[]},
        {gm,handle_info,2,[]},
        {gen_server2,handle_msg,2,[]},
        {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.203.0> terminating
** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
                                  {vote_yes,{tid,10316,<0.203.0>}}}
** When Server state == 1
** Reason for termination ==
** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
                               {vote_yes,{tid,10316,<0.203.0>}}}}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.275.0> terminating
** Last message in was {'DOWN',#Ref<0.0.1.38240>,process,<5383.274.0>,
                               noconnection}
** When Server state == {state,
                            {70,<0.275.0>},
                            {{76,<5066.274.0>},#Ref<0.0.1.42305>},
                            {{73,<5383.274.0>},#Ref<0.0.1.38240>},
                            {resource,<<"/IEC">>,queue,
                                <<"activity.queue.dead">>},
                            rabbit_mirror_queue_coordinator,
                            {77,
                             [{{70,<0.275.0>},
                               {view_member,
                                   {70,<0.275.0>},
                                   [],
                                   {76,<5066.274.0>},
                                   {73,<5383.274.0>}}},
                              {{73,<5383.274.0>},
                               {view_member,
                                   {73,<5383.274.0>},
                                   [],
                                   {70,<0.275.0>},
                                   {76,<5066.274.0>}}},
                              {{76,<5066.274.0>},
                               {view_member,
                                   {76,<5066.274.0>},
                                   [],
                                   {73,<5383.274.0>},
                                   {70,<0.275.0>}}}]},
                            6,
                            [{{70,<0.275.0>},{member,{[],[]},6,6}},
                             {{73,<5383.274.0>},{member,{[],[]},1,1}},
                             {{76,<5066.274.0>},{member,{[],[]},0,0}}],
                            [<0.1273.0>],
                            {[],[]},
                            [],undefined,
                            #Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {noproc,{gen_server2,call,
                        [<0.203.0>,
                         {submit,#Fun<rabbit_misc.6.116010224>},
                         infinity]}}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.204.0> terminating
** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
                                  {vote_yes,{tid,10315,<0.204.0>}}}
** When Server state == 2
** Reason for termination ==
** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
                               {vote_yes,{tid,10315,<0.204.0>}}}}

=ERROR REPORT==== 19-Feb-2014::18:34:30 ===
** Generic server <0.1268.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<5066.266.0>,<0.267.0>]}}
** When Server state == {state,
                            {amqqueue,
                                {resource,<<"/IEC">>,queue,
                                    <<"gps.queue.dead">>},
                                true,false,none,[],<0.266.0>,
                                [<5066.265.0>],
                                [<5066.265.0>],
                                [{vhost,<<"/IEC">>},
                                 {name,<<"Queue HA">>},
                                 {pattern,<<".queue">>},
                                 {'apply-to',<<"queues">>},
                                 {definition,
                                     [{<<"ha-mode">>,<<"all">>},

{<<"ha-sync-mode">>,<<"automatic">>}]},
                                 {priority,0}],
                                [{<5066.266.0>,<5066.265.0>},
                                 {<5383.266.0>,<5383.265.0>}],
                                []},
                            <0.267.0>,
                            {state,
                                {dict,0,16,16,8,80,48,
                                    {[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                      [],[],[]}}},
                                erlang},
                            #Fun<rabbit_mirror_queue_master.5.69128381>,
                            #Fun<rabbit_mirror_queue_master.6.50493311>}
** Reason for termination ==
** {{case_clause,{ok,<5066.265.0>,[]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
     {gen_server2,handle_msg,2,[]},
     {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}

=ERROR REPORT==== 19-Feb-2014::18:34:31 ===
** Generic server <0.266.0> terminating
** Last message in was {'EXIT',<0.1268.0>,
                           {{case_clause,{ok,<5066.265.0>,[]}},
                            [{rabbit_mirror_queue_coordinator,handle_cast,2,
                                 []},
                             {gen_server2,handle_msg,2,[]},
                             {proc_lib,wake_up,3,
                                 [{file,"proc_lib.erl"},{line,249}]}]}}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
                          true,false,none,[],<0.266.0>,
                          [<5383.265.0>,<5066.265.0>],
                          [<5066.265.0>,<5383.265.0>],
                          [{vhost,<<"/IEC">>},
                           {name,<<"Queue HA">>},
                           {pattern,<<".queue">>},
                           {'apply-to',<<"queues">>},
                           {definition,
                            [{<<"ha-mode">>,<<"all">>},
                             {<<"ha-sync-mode">>,<<"automatic">>}]},
                           {priority,0}],
                          [{<5066.266.0>,<5066.265.0>},
                           {<5383.266.0>,<5383.265.0>},
                           {<0.267.0>,<0.266.0>}],
                          []},
                         none,false,rabbit_mirror_queue_master,
                         {state,
                          {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
                          <0.267.0>,<0.1268.0>,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {0,{[],[]}},
                           0,
                           {0,nil},
                           {0,nil},
                           {qistate,
                            "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia
/queues/6IXYXKMC8M51EEAXH5MKLR0Q4",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,65536,
                            #Fun<rabbit_variable_queue.2.81334491>,
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,

 <<55,209,140,132,77,86,75,214,37,255,72,56,103,92,
                               154,75>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,340043,
                              "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent"},
                             rabbit_msg_store_ets_index,
                             "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent",
                             <0.255.0>,344140,335946,348237,352334},
                            {client_msstate,msg_store_transient,
                             <<148,176,200,245,252,25,203,27,190,186,25,104,
                               217,230,131,35>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,319558,
                              "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "d:/tools/RabbitMQ
Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient",
                             <0.250.0>,323655,315461,327752,331849}},
                           true,0,0,0,infinity,0,0,0,0,0,
                           {rates,
                            {{1392,831016,530070},0},
                            {{1392,831016,530070},0},
                            0.0,0.0,
                            {1392,831128,748070}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,
                           {rates,
                            {{1392,831016,530070},0},
                            {{1392,831016,530070},0},
                            0.0,0.0,
                            {1392,831128,748070}}},
                          {dict,0,16,16,8,80,48,

 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          [],
                          {set,0,16,16,8,80,48,

 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}}},
                         {queue,[],[],0},
                         undefined,undefined,undefined,undefined,
                         {state,fine,5000,undefined},
                         {0,nil},
                         undefined,undefined,undefined,
                         {state,
                          {dict,0,16,16,8,80,48,

 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          delegate},
                         undefined,undefined,undefined,4,running}
** Reason for termination ==
** {{case_clause,{ok,<5066.265.0>,[]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
     {gen_server2,handle_msg,2,[]},
     {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}


Any ideas?

Best regards,

* ________________________________________________________________*
*Michaël OULLION*
*Architecte JAVA*
 *ND Informatique*
Adresse (1208 route des Pierrelles B.P. 98 BEAUSEMBLANT - 26240
Beausemblant - FRANCE)
Tel. +33 (0)4 75 23 68 07
Visit our web site at www.norbert-dentressangle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140220/4a2828a3/attachment.html>


More information about the rabbitmq-discuss mailing list