[rabbitmq-discuss] RabbitMQ Cluster, split network & VMWare snapshot

Michael Oullion michael.oullion at norbert-dentressangle.com
Thu Feb 20 20:11:43 GMT 2014


Thanks Jerry for your quick answer.
What can we do in this situation?
Maybe we can uprise the net tick or use a specific behaviour to manage
network split.
Or simply stop take snapshot of the vm because it's not necessary?

Regards.
Le 20 févr. 2014 17:57, "Jerry Kuch" <jkuch at gopivotal.com> a écrit :

> Hi, Michael:
>
> This isn't terribly surprising.  Snapshotting a VM is likely to render it
> less responsive than it normally would be for some amount of time.  Whether
> that period ends before some other node in your cluster misses heartbeats
> and gives you grief about it, is a coin flip.
>
> Best regards,
> Jerry
>
>
>
> On Thu, Feb 20, 2014 at 3:12 AM, Michael Oullion <
> michael.oullion at norbert-dentressangle.com> wrote:
>
>> Hi all,
>>
>> We observe some net split on our cluster and we don't know why.
>> Before change the net tick parameter and change the net split behavior, I
>> want to understand why it's happening.
>> Our environment is :
>> RabbitMQ 3.2.1 Elrang R16B
>> 3 RabbitMQ Node in the same sub-network
>> RabbitMQ is installed on Windows 2008 R2 (VMWare ESXi 5.1)
>> We have 4 Mirrored Queues on this cluster.
>> In production, the normal stream is about 20 messages/second.
>>
>> We observe that split occurs always at the end of the snapshot
>> (NetBackup) on the VM.
>> But, we made snapshot each night and the network split occurs 1 time each
>> 15 or 20 days.
>>
>> *Log server rabbit at FRA-VSP-32545 :*
>> =INFO REPORT==== 19-Feb-2014::18:34:47 ===
>> rabbit on node 'rabbit at FRA-VSP-32596' down
>>
>> =INFO REPORT==== 19-Feb-2014::18:34:49 ===
>> Mirrored-queue (queue 'conso.queue.dead' in vhost '/IEC'): Slave
>> <'rabbit at FRA-VSP-32545'.2.269.0> saw deaths of mirrors
>> <'rabbit at FRA-VSP-32596'.1.270.0>
>>
>>
>> *Log server rabbit at FRA-VSP-32596 :*
>>
>> =INFO REPORT==== 19-Feb-2014::18:34:28 ===
>> rabbit on node 'rabbit at FRA-VSP-32545' down
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.279.0> terminating
>> ** Last message in was {'DOWN',#Ref<0.0.0.248452>,process,<5383.278.0>,
>>                                noconnection}
>> ** When Server state == {state,
>>                             {76,<0.279.0>},
>>                             {{79,<5383.278.0>},#Ref<0.0.0.248452>},
>>                             {{82,<5066.278.0>},#Ref<0.0.1.42330>},
>>                             {resource,<<"/IEC">>,queue,<<"conso.queue">>},
>>                             rabbit_mirror_queue_coordinator,
>>                             {83,
>>                              [{{76,<0.279.0>},
>>                                {view_member,
>>                                    {76,<0.279.0>},
>>                                    [],
>>                                    {79,<5383.278.0>},
>>                                    {82,<5066.278.0>}}},
>>                               {{79,<5383.278.0>},
>>                                {view_member,
>>                                    {79,<5383.278.0>},
>>                                    [],
>>                                    {82,<5066.278.0>},
>>                                    {76,<0.279.0>}}},
>>                               {{82,<5066.278.0>},
>>                                {view_member,
>>                                    {82,<5066.278.0>},
>>                                    [],
>>                                    {76,<0.279.0>},
>>                                    {79,<5383.278.0>}}}]},
>>                             1457518,
>>
>> [{{76,<0.279.0>},{member,{[],[]},1457518,1457518}},
>>                              {{79,<5383.278.0>},{member,{[],[]},1,1}},
>>                              {{82,<5066.278.0>},{member,{[],[]},0,0}}],
>>                             [<0.1272.0>],
>>                             {[],[]},
>>                             [],undefined,
>>
>> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
>> ** Reason for termination ==
>> ** {function_clause,
>>        [{orddict,fetch,
>>             [{76,<0.279.0>},
>>              [{{82,<5066.278.0>},
>>                {view_member,
>>                    {82,<5066.278.0>},
>>                    [{79,<5383.278.0>}],
>>                    {82,<5066.278.0>},
>>                    {82,<5066.278.0>}}}]],
>>             [{file,"orddict.erl"},{line,72}]},
>>         {gm,check_neighbours,1,[]},
>>         {gm,handle_info,2,[]},
>>         {gen_server2,handle_msg,2,[]},
>>         {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.283.0> terminating
>> ** Last message in was {'DOWN',#Ref<0.0.0.248454>,process,<5383.282.0>,
>>                                noconnection}
>> ** When Server state == {state,
>>                             {67,<0.283.0>},
>>                             {{70,<5383.282.0>},#Ref<0.0.0.248454>},
>>                             {{73,<5066.282.0>},#Ref<0.0.1.42352>},
>>                             {resource,<<"/IEC">>,queue,<<"event.queue">>},
>>                             rabbit_mirror_queue_coordinator,
>>                             {74,
>>                              [{{67,<0.283.0>},
>>                                {view_member,
>>                                    {67,<0.283.0>},
>>                                    [],
>>                                    {70,<5383.282.0>},
>>                                    {73,<5066.282.0>}}},
>>                               {{70,<5383.282.0>},
>>                                {view_member,
>>                                    {70,<5383.282.0>},
>>                                    [],
>>                                    {73,<5066.282.0>},
>>                                    {67,<0.283.0>}}},
>>                               {{73,<5066.282.0>},
>>                                {view_member,
>>                                    {73,<5066.282.0>},
>>                                    [],
>>                                    {67,<0.283.0>},
>>                                    {70,<5383.282.0>}}}]},
>>                             212075,
>>
>> [{{67,<0.283.0>},{member,{[],[]},212075,212075}},
>>                              {{70,<5383.282.0>},{member,{[],[]},1,1}},
>>                              {{73,<5066.282.0>},{member,{[],[]},0,0}}],
>>                             [<0.1271.0>],
>>                             {[],[]},
>>                             [],undefined,
>>
>> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
>> ** Reason for termination ==
>> ** {function_clause,
>>        [{orddict,fetch,
>>             [{67,<0.283.0>},
>>              [{{73,<5066.282.0>},
>>                {view_member,
>>                    {73,<5066.282.0>},
>>                    [{70,<5383.282.0>}],
>>                    {73,<5066.282.0>},
>>                    {73,<5066.282.0>}}}]],
>>             [{file,"orddict.erl"},{line,72}]},
>>         {gm,check_neighbours,1,[]},
>>         {gm,handle_info,2,[]},
>>         {gen_server2,handle_msg,2,[]},
>>         {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.203.0> terminating
>> ** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
>>                                   {vote_yes,{tid,10316,<0.203.0>}}}
>> ** When Server state == 1
>> ** Reason for termination ==
>> ** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
>>                                {vote_yes,{tid,10316,<0.203.0>}}}}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.275.0> terminating
>> ** Last message in was {'DOWN',#Ref<0.0.1.38240>,process,<5383.274.0>,
>>                                noconnection}
>> ** When Server state == {state,
>>                             {70,<0.275.0>},
>>                             {{76,<5066.274.0>},#Ref<0.0.1.42305>},
>>                             {{73,<5383.274.0>},#Ref<0.0.1.38240>},
>>                             {resource,<<"/IEC">>,queue,
>>                                 <<"activity.queue.dead">>},
>>                             rabbit_mirror_queue_coordinator,
>>                             {77,
>>                              [{{70,<0.275.0>},
>>                                {view_member,
>>                                    {70,<0.275.0>},
>>                                    [],
>>                                    {76,<5066.274.0>},
>>                                    {73,<5383.274.0>}}},
>>                               {{73,<5383.274.0>},
>>                                {view_member,
>>                                    {73,<5383.274.0>},
>>                                    [],
>>                                    {70,<0.275.0>},
>>                                    {76,<5066.274.0>}}},
>>                               {{76,<5066.274.0>},
>>                                {view_member,
>>                                    {76,<5066.274.0>},
>>                                    [],
>>                                    {73,<5383.274.0>},
>>                                    {70,<0.275.0>}}}]},
>>                             6,
>>                             [{{70,<0.275.0>},{member,{[],[]},6,6}},
>>                              {{73,<5383.274.0>},{member,{[],[]},1,1}},
>>                              {{76,<5066.274.0>},{member,{[],[]},0,0}}],
>>                             [<0.1273.0>],
>>                             {[],[]},
>>                             [],undefined,
>>
>> #Fun<rabbit_misc.execute_mnesia_transaction.1>}
>> ** Reason for termination ==
>> ** {noproc,{gen_server2,call,
>>                         [<0.203.0>,
>>                          {submit,#Fun<rabbit_misc.6.116010224>},
>>                          infinity]}}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.204.0> terminating
>> ** Last message in was {mnesia_tm,'rabbit at FRA-VSP-32545',
>>                                   {vote_yes,{tid,10315,<0.204.0>}}}
>> ** When Server state == 2
>> ** Reason for termination ==
>> ** {unexpected_info,{mnesia_tm,'rabbit at FRA-VSP-32545',
>>                                {vote_yes,{tid,10315,<0.204.0>}}}}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:30 ===
>> ** Generic server <0.1268.0> terminating
>> ** Last message in was {'$gen_cast',{gm_deaths,[<5066.266.0>,<0.267.0>]}}
>> ** When Server state == {state,
>>                             {amqqueue,
>>                                 {resource,<<"/IEC">>,queue,
>>                                     <<"gps.queue.dead">>},
>>                                 true,false,none,[],<0.266.0>,
>>                                 [<5066.265.0>],
>>                                 [<5066.265.0>],
>>                                 [{vhost,<<"/IEC">>},
>>                                  {name,<<"Queue HA">>},
>>                                  {pattern,<<".queue">>},
>>                                  {'apply-to',<<"queues">>},
>>                                  {definition,
>>                                      [{<<"ha-mode">>,<<"all">>},
>>
>> {<<"ha-sync-mode">>,<<"automatic">>}]},
>>                                  {priority,0}],
>>                                 [{<5066.266.0>,<5066.265.0>},
>>                                  {<5383.266.0>,<5383.265.0>}],
>>                                 []},
>>                             <0.267.0>,
>>                             {state,
>>                                 {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                      [],[],[]},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                       [],[],[]}}},
>>                                 erlang},
>>                             #Fun<rabbit_mirror_queue_master.5.69128381>,
>>                             #Fun<rabbit_mirror_queue_master.6.50493311>}
>> ** Reason for termination ==
>> ** {{case_clause,{ok,<5066.265.0>,[]}},
>>     [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
>>      {gen_server2,handle_msg,2,[]},
>>      {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>> =ERROR REPORT==== 19-Feb-2014::18:34:31 ===
>> ** Generic server <0.266.0> terminating
>> ** Last message in was {'EXIT',<0.1268.0>,
>>                            {{case_clause,{ok,<5066.265.0>,[]}},
>>
>> [{rabbit_mirror_queue_coordinator,handle_cast,2,
>>                                  []},
>>                              {gen_server2,handle_msg,2,[]},
>>                              {proc_lib,wake_up,3,
>>                                  [{file,"proc_lib.erl"},{line,249}]}]}}
>> ** When Server state == {q,
>>                          {amqqueue,
>>
>> {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
>>                           true,false,none,[],<0.266.0>,
>>                           [<5383.265.0>,<5066.265.0>],
>>                           [<5066.265.0>,<5383.265.0>],
>>                           [{vhost,<<"/IEC">>},
>>                            {name,<<"Queue HA">>},
>>                            {pattern,<<".queue">>},
>>                            {'apply-to',<<"queues">>},
>>                            {definition,
>>                             [{<<"ha-mode">>,<<"all">>},
>>                              {<<"ha-sync-mode">>,<<"automatic">>}]},
>>                            {priority,0}],
>>                           [{<5066.266.0>,<5066.265.0>},
>>                            {<5383.266.0>,<5383.265.0>},
>>                            {<0.267.0>,<0.266.0>}],
>>                           []},
>>                          none,false,rabbit_mirror_queue_master,
>>                          {state,
>>
>> {resource,<<"/IEC">>,queue,<<"gps.queue.dead">>},
>>                           <0.267.0>,<0.1268.0>,rabbit_variable_queue,
>>                           {vqstate,
>>                            {0,{[],[]}},
>>                            {0,{[],[]}},
>>                            {delta,undefined,0,undefined},
>>                            {0,{[],[]}},
>>                            {0,{[],[]}},
>>                            0,
>>                            {0,nil},
>>                            {0,nil},
>>                            {qistate,
>>                             "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia
>> /queues/6IXYXKMC8M51EEAXH5MKLR0Q4",
>>                             {{dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                []},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                 []}}},
>>                              []},
>>                             undefined,0,65536,
>>                             #Fun<rabbit_variable_queue.2.81334491>,
>>                             {0,nil}},
>>                            {{client_msstate,msg_store_persistent,
>>
>>  <<55,209,140,132,77,86,75,214,37,255,72,56,103,92,
>>                                154,75>>,
>>                              {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                []},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                 []}}},
>>                              {state,340043,
>>                               "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent"},
>>                              rabbit_msg_store_ets_index,
>>                              "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_persistent",
>>                              <0.255.0>,344140,335946,348237,352334},
>>                             {client_msstate,msg_store_transient,
>>
>>  <<148,176,200,245,252,25,203,27,190,186,25,104,
>>                                217,230,131,35>>,
>>                              {dict,0,16,16,8,80,48,
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                []},
>>
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                                 []}}},
>>                              {state,319558,
>>                               "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient"},
>>                              rabbit_msg_store_ets_index,
>>                              "d:/tools/RabbitMQ
>> Server/data/db/rabbit at FRA-VSP-32596-mnesia/msg_store_transient",
>>                              <0.250.0>,323655,315461,327752,331849}},
>>                            true,0,0,0,infinity,0,0,0,0,0,
>>                            {rates,
>>                             {{1392,831016,530070},0},
>>                             {{1392,831016,530070},0},
>>                             0.0,0.0,
>>                             {1392,831128,748070}},
>>                            {0,nil},
>>                            {0,nil},
>>                            {0,nil},
>>                            {0,nil},
>>                            0,0,
>>                            {rates,
>>                             {{1392,831016,530070},0},
>>                             {{1392,831016,530070},0},
>>                             0.0,0.0,
>>                             {1392,831128,748070}}},
>>                           {dict,0,16,16,8,80,48,
>>
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>                            {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                              []}}},
>>                           [],
>>                           {set,0,16,16,8,80,48,
>>
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>                            {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                              []}}}},
>>                           {queue,[],[],0},
>>                          undefined,undefined,undefined,undefined,
>>                          {state,fine,5000,undefined},
>>                          {0,nil},
>>                          undefined,undefined,undefined,
>>                          {state,
>>                           {dict,0,16,16,8,80,48,
>>
>>  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>                            {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                              []}}},
>>                           delegate},
>>                          undefined,undefined,undefined,4,running}
>> ** Reason for termination ==
>> ** {{case_clause,{ok,<5066.265.0>,[]}},
>>     [{rabbit_mirror_queue_coordinator,handle_cast,2,[]},
>>      {gen_server2,handle_msg,2,[]},
>>      {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>>
>>
>> Any ideas?
>>
>> Best regards,
>>
>> * ________________________________________________________________*
>> *Michaël OULLION*
>> *Architecte JAVA*
>>  *ND Informatique*
>> Adresse (1208 route des Pierrelles B.P. 98 BEAUSEMBLANT - 26240
>> Beausemblant - FRANCE)
>> Tel. +33 (0)4 75 23 68 07
>> Visit our web site at www.norbert-dentressangle.com
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140220/172966b8/attachment.html>


More information about the rabbitmq-discuss mailing list