[rabbitmq-discuss] 2.8.7 - Unexpected termination of HA node

Rensen, Nathanael nathanael.rensen at zettaserve.com
Mon Oct 8 01:56:30 BST 2012


While testing RabbitMQ version 2.8.7 I've encountered an issue where the slave node in a HA pair terminates when the master is forcefully rebooted.

The environment is Centos 6.3 with RabbitMQ 2.8.7, Erlang R14B, with a durable queue mirrored across a pair of nodes in a HA cluster. The master is rebooted using reboot -nf to simulate an ungraceful failure. When the master comes back up the slave node terminates with error log shown below.

The behaviour when testing with version 2.8.6 was that the queue would vanish from the cluster. With version 2.8.7 the queue remains intact, but the slave node terminates.

I've found that if the rabbitmq-server is not configured to start at boot the slave node eventually promotes itself to master and survives the eventual restart of the original master.

Nathanael

=INFO REPORT==== 5-Oct-2012::17:24:35 ===
rabbit on node 'rabbit at zg-dev-nr-002' down

=INFO REPORT==== 5-Oct-2012::17:24:35 ===
Statistics database started.

=INFO REPORT==== 5-Oct-2012::17:24:35 ===
Mirrored-queue (queue 'zg-ha-test' in vhost '/'): Slave
<'rabbit at zg-dev-nr-001'.3.252.0> saw deaths of mirrors
<'rabbit at zg-dev-nr-002'.2.252.0>

=INFO REPORT==== 5-Oct-2012::17:24:35 ===
Mirrored-queue (queue 'zg-ha-test' in vhost '/'): Promoting slave
<'rabbit at zg-dev-nr-001'.3.252.0> to master

=ERROR REPORT==== 5-Oct-2012::17:24:35 ===
** Generic server <0.253.0> terminating
** Last message in was {mnesia_tm,'rabbit at zg-dev-nr-002',
                                  {vote_yes,{tid,2235,<0.253.0>}}}
** When Server state == {state,
                            {3,<0.253.0>},
                            {{3,<0.253.0>},undefined},
                            {{3,<0.253.0>},undefined},
                            {resource,<<"/">>,queue,<<"zg-ha-test">>},
                            rabbit_mirror_queue_coordinator,
                            {6,
                             [{{3,<0.253.0>},
                               {view_member,
                                   {3,<0.253.0>},
                                   [],
                                   {3,<0.253.0>},
                                   {3,<0.253.0>}}}]},
                            0,[],
                            [<0.336.0>],
                            {[],[]},
                            [],undefined}
** Reason for termination ==
** {function_clause,
       [{gm,handle_info,
            [{mnesia_tm,'rabbit at zg-dev-nr-002',
                 {vote_yes,{tid,2235,<0.253.0>}}},
             {state,
                 {3,<0.253.0>},
                 {{3,<0.253.0>},undefined},
                 {{3,<0.253.0>},undefined},
                 {resource,<<"/">>,queue,<<"zg-ha-test">>},
                 rabbit_mirror_queue_coordinator,
                 {6,
                  [{{3,<0.253.0>},
                    {view_member,
                        {3,<0.253.0>},
                        [],
                        {3,<0.253.0>},
                        {3,<0.253.0>}}}]},
                 0,[],
                 [<0.336.0>],
                 {[],[]},
                 [],undefined}]},
        {gen_server2,handle_msg,2},
        {proc_lib,wake_up,3}]}

=ERROR REPORT==== 5-Oct-2012::17:24:36 ===
** Generic server <0.252.0> terminating
** Last message in was {'EXIT',<0.336.0>,
                        {function_clause,
                         [{gm,handle_info,
                           [{mnesia_tm,'rabbit at zg-dev-nr-002',
                             {vote_yes,{tid,2235,<0.253.0>}}},
                            {state,
                             {3,<0.253.0>},
                             {{3,<0.253.0>},undefined},
                             {{3,<0.253.0>},undefined},
                             {resource,<<"/">>,queue,<<"zg-ha-test">>},
                             rabbit_mirror_queue_coordinator,
                             {6,
                              [{{3,<0.253.0>},
                                {view_member,
                                 {3,<0.253.0>},
                                 [],
                                 {3,<0.253.0>},
                                 {3,<0.253.0>}}}]},
                             0,[],
                             [<0.336.0>],
                             {[],[]},
                             [],undefined}]},
                          {gen_server2,handle_msg,2},
                          {proc_lib,wake_up,3}]}}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/">>,queue,<<"zg-ha-test">>},
                          true,false,none,
                          [{<<"x-ha-policy">>,longstr,<<"all">>}],
                          <0.252.0>,[],all},
                         none,false,rabbit_mirror_queue_master,
                         {state,<0.253.0>,<0.336.0>,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
[... lots of stuff omitted...]
                           {291,
                           29148,
                           {0,nil},
                           undefined,
                           {0,nil},
                           {qistate,

"/var/lib/rabbitmq/mnesia/rabbit at zg-dev-nr-001/queues/EVSP28XGBN529KMXDI7YW0GM2",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,262144,
                            #Fun<rabbit_variable_queue.2.121033067>,
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,
                             <<64,124,193,190,179,5,132,252,0,55,122,227,216,
                               123,112,237>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,303177,

"/var/lib/rabbitmq/mnesia/rabbit at zg-dev-nr-001/msg_store_persistent"},
                             rabbit_msg_store_ets_index,

"/var/lib/rabbitmq/mnesia/rabbit at zg-dev-nr-001/msg_store_persistent",
                             <0.246.0>,307274,299080,311371,315468},
                            {client_msstate,msg_store_transient,
                             <<38,48,43,9,149,2,213,22,77,78,223,24,149,53,175,
                               245>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,282692,

"/var/lib/rabbitmq/mnesia/rabbit at zg-dev-nr-001/msg_store_transient"},
                             rabbit_msg_store_ets_index,

"/var/lib/rabbitmq/mnesia/rabbit at zg-dev-nr-001/msg_store_transient",
                             <0.241.0>,286789,278595,290886,294983}},
                           true,0,#Fun<rabbit_mirror_queue_slave.1.40624233>,
                           291,0,infinity,291,0,152,0,291,
                           {rates,
                            {{1349,429050,591110},0},
                            {{1349,429050,591110},0},
                            1076.4632093998348,1076.4632093998348,
                            {1349,429051,592073}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,
                           {rates,
                            {{1349,429050,591110},0},
                            {{1349,429050,591110},0},
                            1071.7981467372135,1076.4632093998348,
                            {1349,429051,592073}}},
                          0,
                          {dict,6,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],
                             [[<<218,200,61,58,97,50,72,197,144,158,220,215,
                                 159,163,253,105>>|
                               published]],
                             [[<<80,182,167,56,95,139,134,134,192,40,123,239,
                                 62,193,49,206>>|
                               published],
                              [<<159,219,2,228,110,119,150,246,97,74,183,72,
                                 254,233,74,33>>|
                               published],
                              [<<223,108,153,205,69,107,192,83,254,145,181,
                                 172,15,61,33,190>>|
                               published]],
                             [],[],[],[],
                             [[<<8,84,74,65,208,81,184,115,74,86,225,237,241,
                                 172,148,18>>|
                               published]],
                             [],[],[],[],[],
                             [[<<"õjq$ùøÞS!ý,a»úuÿ">>|published]],
                             []}}},
                          [],
                          {dict,0,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          {set,0,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}}},
                         {[],[]},
                         undefined,undefined,#Ref<0.0.0.1342>,undefined,
                         {state,fine,5000,#Ref<0.0.0.1406>},
                         {0,nil},
                         undefined,undefined,undefined,
                         {dict,1,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],
                            [[<7527.317.0>|#Ref<0.0.0.732>]],
                            [],[],[],[],[],[],[],[],[]}}},
                         1,
                         {{0,nil},{0,nil}},
                         undefined,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         undefined,undefined}
** Reason for termination ==
** {noproc,{gen_server2,call,[<0.253.0>,info,infinity]}}
** In 'terminate' callback with reason ==
** {function_clause,
       [{gm,handle_info,
            [{mnesia_tm,'rabbit at zg-dev-nr-002',
                 {vote_yes,{tid,2235,<0.253.0>}}},
             {state,
                 {3,<0.253.0>},
                 {{3,<0.253.0>},undefined},
                 {{3,<0.253.0>},undefined},
                 {resource,<<"/">>,queue,<<"zg-ha-test">>},
                 rabbit_mirror_queue_coordinator,
                 {6,
                  [{{3,<0.253.0>},
                    {view_member,
                        {3,<0.253.0>},
                        [],
                        {3,<0.253.0>},
                        {3,<0.253.0>}}}]},
                 0,[],
                 [<0.336.0>],
                 {[],[]},
                 [],undefined}]},
        {gen_server2,handle_msg,2},
        {proc_lib,wake_up,3}]}

=INFO REPORT==== 5-Oct-2012::17:24:36 ===
rabbit on node 'rabbit at zg-dev-nr-002' up

=ERROR REPORT==== 5-Oct-2012::17:24:36 ===
** Generic server <0.292.0> terminating
** Last message in was {mnesia_tm,'rabbit at zg-dev-nr-002',
                                  {vote_yes,{tid,2236,<0.292.0>}}}
** When Server state == {state,<0.290.0>,<0.291.0>,rabbit_mgmt_sup,
                            [{rabbit_mgmt_db,
                                 {rabbit_mgmt_db,start_link,[]},
                                 permanent,4294967295,worker,
                                 [rabbit_mgmt_db]}]}
** Reason for termination ==
** {unexpected_info,{mnesia_tm,'rabbit at zg-dev-nr-002',
                               {vote_yes,{tid,2236,<0.292.0>}}}}

=INFO REPORT==== 5-Oct-2012::17:24:36 ===
    application: rabbitmq_management
    exited: shutdown
    type: permanent

=INFO REPORT==== 5-Oct-2012::17:24:36 ===
stopped TCP Listener on [::]:5672

________________________________

ZettaServe Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately if you have received this email by mistake and delete this email from your system. Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. ZettaServe Pty Ltd accepts no liability for any damage caused by any virus transmitted by this email.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121008/416518b3/attachment.htm>


More information about the rabbitmq-discuss mailing list