<div dir="ltr">I received a message regarding my cluster state saying "Network partition detected". I went to check my RabbitMQ logs and I can see a bunch of error reports like this:<div><br></div><div><div>=ERROR REPORT==== 7-Dec-2013::08:46:18 ===</div><div>** Generic server <0.507.0> terminating</div><div>** Last message in was {'DOWN',#Ref<0.0.0.74464>,process,<7022.1390.0>,</div><div>                               noconnection}</div><div>** When Server state == {state,</div><div>                            {0,<0.507.0>},</div><div>                            {{7,<7022.1390.0>},#Ref<0.0.0.74464>},</div><div>                            {{0,<7021.456.0>},#Ref<0.0.0.69498>},</div><div>                            {resource,<<"UAT_ENT">>,queue,</div><div>                                <<"queue.1">>},</div><div>                            rabbit_mirror_queue_coordinator,</div><div>                            {8,</div><div>                             [{{0,<7021.456.0>},</div><div>                               {view_member,</div><div>                                   {0,<7021.456.0>},</div><div>                                   [],</div><div>                                   {0,<0.507.0>},</div><div>                                   {7,<7022.1390.0>}}},</div><div>                              {{0,<0.507.0>},</div><div>                               {view_member,</div><div>                                   {0,<0.507.0>},</div><div>                                   [],</div><div>                                   {7,<7022.1390.0>},</div><div>                                   {0,<7021.456.0>}}},</div><div>                              {{7,<7022.1390.0>},</div><div>                               {view_member,</div><div>                                   {7,<7022.1390.0>},</div><div>                                   [],</div><div>                                   {0,<7021.456.0>},</div><div>                                   {0,<0.507.0>}}}]},</div><div>                            43,</div><div>                            [{{0,<7021.456.0>},{member,{[],[]},0,0}},</div><div>                             {{0,<0.507.0>},{member,{[],[]},43,43}},</div><div>                             {{7,<7022.1390.0>},{member,{[],[]},0,0}}],</div><div>                            [<0.506.0>],</div><div>                            {[],[]},</div><div>                            [],undefined,</div><div>                            #Fun<rabbit_misc.execute_mnesia_transaction.1>}</div><div>** Reason for termination == </div><div>** {function_clause,[{orddict,fetch,</div><div>                              [{0,<0.507.0>},[]],</div><div>                              [{file,"orddict.erl"},{line,72}]},</div><div>                     {gm,check_neighbours,1,[]},</div><div>                     {gm,handle_info,2,[]},</div><div>                     {gen_server2,handle_msg,2,[]},</div><div>                     {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,237}]}]}</div></div><div><br></div><div>After restarting the troubled node, which fixed the network partition message, I see the following message in my logs many times:<br></div><div><br></div><div>Discarding message {'$gen_call',{<0.26793.8>,#Ref<0.0.1.31326>},stat} from <0.26793.8> to <0.433.0> in an old incarnation (3) of this node (2)<br></div><div><br></div><div>I'm not sure why it failed, but I did have some network failure indicated in other systems, so I assume it was that. My issue is that the network never tried to rescue itself afterwards, even though in my rabbitmq.conf I have cluster_partition_handling set to autoheal. It is my understanding that setting it to autoheal will cause the nodes to fix its network partition, is this assumption incorrect?</div></div>