[rabbitmq-discuss] Loss of queues after network partition?

Matt Pietrek mpietrek at skytap.com
Fri Oct 12 21:01:21 BST 2012


I'll be the first to admit that this is a somewhat unusual/complicated
circumstance, but I'm posting this on the chance there's something unusual.
This is RabbitMQ is *2.8.7 *on Erlang R15B01.

We have two brokers (mq1, mq2) running clustered with all HA queues. A VIP
is in front of them.

At some point there was a network event, and afterwards, a queue vanished.
(Still trying to figure out exactly what the network hiccup was, but it's
fair to say there was a communication disruption between the nodes.)  My
guess from looking at the logs is that each node had a moment where it
couldn't see the other, so took actions that were ultimately incompatible
when both nodes were visible to each other.

The actual error as seen by our code is a missing queue named "charon".
You'll see it near the end of the rabbit at mq2.log (below):

*{amqp_error,not_found,"no queue 'charon' in vhost '/'",'queue.declare'}*

Prior to the network event, our client was successfully reading from the
charon queue.


*Here's the relevant parts of the rabbit at mq1.log file
====================================================*
=INFO REPORT==== 11-Oct-2012::14:01:58 ===
rabbit on node rabbit at mq2 down

=ERROR REPORT==== 11-Oct-2012::14:02:03 ===
Mnesia(rabbit at mq1): ** ERROR ** mnesia_event got {inconsistent_database,
running_partitioned_network, rabbit at mq2}

=ERROR REPORT==== 11-Oct-2012::14:02:03 ===
** Generic server <0.547.0> terminating
** Last message in was {mnesia_locker,rabbit at mq2,granted}
** When Server state == {state,
                            {0,<0.547.0>},
                            {{0,<0.547.0>},undefined},
                            {{0,<0.547.0>},undefined},
                            {resource,<<"/">>,queue,<<"charon">>},
                            rabbit_mirror_queue_coordinator,
                            {3,
                             [{{0,<0.547.0>},
                               {view_member,
                                   {0,<0.547.0>},
                                   [],
                                   {0,<0.547.0>},
                                   {0,<0.547.0>}}}]},
                            1,[],
                            [<0.546.0>],
                            {[],[]},
                            [],undefined}
** Reason for termination ==
** {function_clause,
       [{gm,handle_info,
            [{mnesia_locker,rabbit at mq2,granted},
             {state,
                 {0,<0.547.0>},
                 {{0,<0.547.0>},undefined},
                 {{0,<0.547.0>},undefined},
                 {resource,<<"/">>,queue,<<"charon">>},
                 rabbit_mirror_queue_coordinator,
                 {3,
                  [{{0,<0.547.0>},
                    {view_member,
                        {0,<0.547.0>},
                        [],
                        {0,<0.547.0>},
                        {0,<0.547.0>}}}]},
                 1,[],
                 [<0.546.0>],
                 {[],[]},
                 [],undefined}],
            [{file,"src/gm.erl"},{line,665}]},

{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,912}]},
        {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,237}]}]}

=ERROR REPORT==== 11-Oct-2012::14:02:03 ===
** Generic server <0.545.0> terminating
** Last message in was {'EXIT',<0.546.0>,
                        {function_clause,
                         [{gm,handle_info,
                           [{mnesia_locker,rabbit at mq2,granted},
                            {state,
                             {0,<0.547.0>},
                             {{0,<0.547.0>},undefined},
                             {{0,<0.547.0>},undefined},
                             {resource,<<"/">>,queue,<<"charon">>},
                             rabbit_mirror_queue_coordinator,
                             {3,
                              [{{0,<0.547.0>},
                                {view_member,
                                 {0,<0.547.0>},
                                 [],
                                 {0,<0.547.0>},
                                 {0,<0.547.0>}}}]},
                             1,[],
                             [<0.546.0>],
                             {[],[]},
                             [],undefined}],
                           [{file,"src/gm.erl"},{line,665}]},
                          {gen_server2,handle_msg,2,
                           [{file,"src/gen_server2.erl"},{line,912}]},
                          {proc_lib,wake_up,3,
                           [{file,"proc_lib.erl"},{line,237}]}]}}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/">>,queue,<<"charon">>},
                          true,false,none,
                          [{<<"x-ha-policy">>,longstr,<<"all">>}],
                          <0.545.0>,[],all},
                         none,false,rabbit_mirror_queue_master,
                         {state,<0.547.0>,<0.546.0>,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {0,{[],[]}},
                           0,
                           {0,nil},
                           undefined,
                           {0,nil},
                           {qistate,
                            "/highland/var/lib/rabbit at mq1
/queues/4L2SY4TVRPM8HP85JDVO85BPL",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,262144,
                            #Fun<rabbit_variable_queue.2.121500735>,
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,

<<112,32,84,92,54,73,67,43,231,35,164,79,106,8,27,
                               132>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,237637,
                              "/highland/var/lib/rabbit at mq1
/msg_store_persistent"},
                             rabbit_msg_store_ets_index,
                             "/highland/var/lib/rabbit at mq1
/msg_store_persistent",
                             <0.289.0>,241734,233540,245831,249928},
                            {client_msstate,msg_store_transient,

<<204,18,155,40,29,120,124,76,144,83,195,135,250,
                               91,216,3>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,217152,
                              "/highland/var/lib/rabbit at mq1
/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "/highland/var/lib/rabbit at mq1
/msg_store_transient",
                             <0.284.0>,221249,213055,225346,229443}},

true,0,#Fun<rabbit_amqqueue_process.5.24780734>,0,
                           0,infinity,0,0,0,0,0,
                           {rates,
                            {{1349,988602,967125},0},
                            {{1349,988602,967125},0},
                            0.0,0.0,
                            {1349,988606,141019}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,
                           {rates,
                            {{1349,988602,967125},0},
                            {{1349,988602,967125},0},
                            0.0,0.0,
                            {1349,988606,141019}}},
                          0,
                          {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          [],
                          {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          {set,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}}},
                         {[],[]},
                         undefined,undefined,undefined,undefined,
                         {state,fine,5000,undefined},
                         {0,nil},
                         undefined,undefined,undefined,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         1,
                         {{0,nil},{0,nil}},
                         undefined,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         undefined,undefined}
** Reason for termination ==
** {noproc,{gen_server2,call,[<0.547.0>,info,infinity]}}
** In 'terminate' callback with reason ==
** {function_clause,
       [{gm,handle_info,
            [{mnesia_locker,rabbit at mq2,granted},
             {state,
                 {0,<0.547.0>},
                 {{0,<0.547.0>},undefined},
                 {{0,<0.547.0>},undefined},
                 {resource,<<"/">>,queue,<<"charon">>},
                 rabbit_mirror_queue_coordinator,
                 {3,
                  [{{0,<0.547.0>},
                    {view_member,
                        {0,<0.547.0>},
                        [],
                        {0,<0.547.0>},
                        {0,<0.547.0>}}}]},
                 1,[],
                 [<0.546.0>],
                 {[],[]},
                 [],undefined}],
            [{file,"src/gm.erl"},{line,665}]},

{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,912}]},
        {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,237}]}]}

=INFO REPORT==== 11-Oct-2012::14:02:03 ===
Mirrored-queue (queue 'cmcmd' in vhost '/'): Master <rabbit at mq1.2.665.0>
saw deaths of mirrors <rabbit at mq2.1.383.0>

=INFO REPORT==== 11-Oct-2012::14:02:03 ===
Mirrored-queue (queue 'ConfigurationManager' in vhost '/'): Master
<rabbit at mq1.2.605.0> saw deaths of mirrors <rabbit at mq2.1.377.0>

=INFO REPORT==== 11-Oct-2012::14:02:03 ===
global: Name conflict terminating {rabbit_mgmt_db,<3198.32532.0>}

=INFO REPORT==== 11-Oct-2012::14:26:40 ===
accepting AMQP connection <0.6507.1> (10.72.0.50:46742 -> 10.72.0.47:5672)

=INFO REPORT==== 11-Oct-2012::14:26:40 ===
closing AMQP connection <0.6507.1> (10.72.0.50:46742 -> 10.72.0.47:5672)




*And here's the relevant parts of the rabbit at m2.log file
====================================================*
=ERROR REPORT==== 11-Oct-2012::14:01:37 ===
** Node rabbit at mq1 not responding **
** Removing (timedout) connection **

=INFO REPORT==== 11-Oct-2012::14:01:37 ===
rabbit on node rabbit at mq1 down

=INFO REPORT==== 11-Oct-2012::14:01:40 ===
Statistics database started.

=INFO REPORT==== 11-Oct-2012::14:01:43 ===
Mirrored-queue (queue 'cmcmd' in vhost '/'): Slave <rabbit at mq2.1.383.0> saw
deaths of mirrors <rabbit at mq1.2.665.0>

=INFO REPORT==== 11-Oct-2012::14:01:43 ===
Mirrored-queue (queue 'cmcmd' in vhost '/'): Promoting slave
<rabbit at mq2.1.383.0> to master

=INFO REPORT==== 11-Oct-2012::14:01:43 ===
Mirrored-queue (queue 'charon' in vhost '/'): Slave <rabbit at mq2.1.371.0>
saw deaths of mirrors <rabbit at mq1.2.545.0>

=INFO REPORT==== 11-Oct-2012::14:01:43 ===
Mirrored-queue (queue 'charon' in vhost '/'): Promoting slave
<rabbit at mq2.1.371.0> to master

=INFO REPORT==== 11-Oct-2012::14:01:43 ===
Mirrored-queue (queue 'ConfigurationManager' in vhost '/'): Slave
<rabbit at mq2.1.377.0> saw deaths of mirrors <rabbit at mq1.2.605.0>

=INFO REPORT==== 11-Oct-2012::14:01:43 ===
Mirrored-queue (queue 'ConfigurationManager' in vhost '/'): Promoting slave
<rabbit at mq2.1.377.0> to master

=ERROR REPORT==== 11-Oct-2012::14:05:03 ===
Mnesia(rabbit at mq2): ** ERROR ** mnesia_event got {inconsistent_database,
running_partitioned_network, rabbit at mq1}

=INFO REPORT==== 11-Oct-2012::14:05:03 ===
Statistics database started.

=ERROR REPORT==== 11-Oct-2012::14:29:39 ===
** Generic server <0.32583.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<0.5504.1>]}}
** When Server state == {state,
                            {amqqueue,
                                {resource,<<"/">>,queue,<<"charon">>},
                                true,false,none,
                                [{<<"x-ha-policy">>,longstr,<<"all">>}],
                                <0.371.0>,[],all},
                            <0.372.0>,
                            {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            #Fun<rabbit_mirror_queue_master.1.32244930>,
                            #Fun<rabbit_mirror_queue_master.2.32244930>}
** Reason for termination ==
** {bad_return_value,
       {error,
           {{badmatch,[]},
            [{rabbit_mirror_queue_misc,'-remove_from_queue/2-fun-0-',2,
                 [{file,"src/rabbit_mirror_queue_misc.erl"},{line,63}]},
             {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,829}]},
             {mnesia_tm,execute_transaction,5,
                 [{file,"mnesia_tm.erl"},{line,809}]},
             {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,
                 [{file,"src/rabbit_misc.erl"},{line,440}]},
             {worker_pool_worker,handle_call,3,
                 [{file,"src/worker_pool_worker.erl"},{line,80}]},
             {gen_server2,handle_msg,2,
                 [{file,"src/gen_server2.erl"},{line,879}]},
             {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,237}]}]}}}

=ERROR REPORT==== 11-Oct-2012::14:29:39 ===
** Generic server <0.371.0> terminating
** Last message in was {'EXIT',<0.32583.0>,
                        {bad_return_value,
                         {error,
                          {{badmatch,[]},
                           [{rabbit_mirror_queue_misc,
                             '-remove_from_queue/2-fun-0-',2,
                             [{file,"src/rabbit_mirror_queue_misc.erl"},
                              {line,63}]},
                            {mnesia_tm,apply_fun,3,
                             [{file,"mnesia_tm.erl"},{line,829}]},
                            {mnesia_tm,execute_transaction,5,
                             [{file,"mnesia_tm.erl"},{line,809}]},
                            {rabbit_misc,
                             '-execute_mnesia_transaction/1-fun-0-',1,
                             [{file,"src/rabbit_misc.erl"},{line,440}]},
                            {worker_pool_worker,handle_call,3,

[{file,"src/worker_pool_worker.erl"},{line,80}]},
                            {gen_server2,handle_msg,2,
                             [{file,"src/gen_server2.erl"},{line,879}]},
                            {proc_lib,wake_up,3,
                             [{file,"proc_lib.erl"},{line,237}]}]}}}}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/">>,queue,<<"charon">>},
                          true,false,none,
                          [{<<"x-ha-policy">>,longstr,<<"all">>}],
                          <0.371.0>,[],all},
                         none,false,rabbit_mirror_queue_master,
                         {state,<0.372.0>,<0.32583.0>,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {0,{[],[]}},
                           0,
                           {0,nil},
                           undefined,
                           {0,nil},
                           {qistate,
                            "/highland/var/lib/rabbit at mq2
/queues/4L2SY4TVRPM8HP85JDVO85BPL",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,262144,
                            #Fun<rabbit_variable_queue.2.121500735>,
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,

<<23,245,132,154,3,96,225,184,108,75,232,224,184,
                               114,108,141>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,282693,
                              "/highland/var/lib/rabbit at mq2
/msg_store_persistent"},
                             rabbit_msg_store_ets_index,
                             "/highland/var/lib/rabbit at mq2
/msg_store_persistent",
                             <0.285.0>,286790,278596,290887,294984},
                            {client_msstate,msg_store_transient,

<<17,62,75,125,126,21,66,37,170,44,198,72,18,94,
                               170,197>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,262208,
                              "/highland/var/lib/rabbit at mq2
/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "/highland/var/lib/rabbit at mq2
/msg_store_transient",
                             <0.280.0>,266305,258111,270402,274499}},

true,0,#Fun<rabbit_mirror_queue_slave.1.69615989>,
                           0,0,infinity,0,0,0,0,0,
                           {rates,
                            {{1349,989325,677031},0},
                            {{1349,989325,677031},0},
                            0.0,0.0,
                            {1349,989505,348090}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,
                           {rates,
                            {{1349,989325,677031},0},
                            {{1349,989325,677031},0},
                            0.0,0.0,
                            {1349,989505,348090}}},
                          0,
                          {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          [],
                          {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}},
                          {set,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                             []}}}},
                         {[],[]},
                         undefined,undefined,undefined,undefined,
                         {state,fine,5000,undefined},
                         {0,nil},
                         undefined,undefined,undefined,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         1,
                         {{0,nil},{0,nil}},
                         undefined,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},

{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         undefined,undefined}
** Reason for termination ==
** {noproc,{gen_server2,call,[<0.372.0>,info,infinity]}}
** In 'terminate' callback with reason ==
** {bad_return_value,
       {error,
           {{badmatch,[]},
            [{rabbit_mirror_queue_misc,'-remove_from_queue/2-fun-0-',2,
                 [{file,"src/rabbit_mirror_queue_misc.erl"},{line,63}]},
             {mnesia_tm,apply_fun,3,[{file,"mnesia_tm.erl"},{line,829}]},
             {mnesia_tm,execute_transaction,5,
                 [{file,"mnesia_tm.erl"},{line,809}]},
             {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,
                 [{file,"src/rabbit_misc.erl"},{line,440}]},
             {worker_pool_worker,handle_call,3,
                 [{file,"src/worker_pool_worker.erl"},{line,80}]},
             {gen_server2,handle_msg,2,
                 [{file,"src/gen_server2.erl"},{line,879}]},
             {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,237}]}]}}}

=INFO REPORT==== 11-Oct-2012::14:29:43 ===
accepting AMQP connection <0.5524.1> (10.72.0.23:36056 -> 10.72.0.56:5672)

=INFO REPORT==== 11-Oct-2012::14:29:44 ===
accepting AMQP connection <0.5574.1> (10.72.0.23:36064 -> 10.72.0.56:5672)

=INFO REPORT==== 11-Oct-2012::17:56:16 ===
accepting AMQP connection <0.23868.1> (10.72.0.53:57122 -> 10.72.0.56:5672)

=ERROR REPORT==== 11-Oct-2012::17:56:16 ===
connection <0.23868.1>, channel 1 - error:
{amqp_error,not_found,"no queue 'charon' in vhost '/'",'queue.declare'}

=WARNING REPORT==== 11-Oct-2012::17:56:16 ===
closing AMQP connection <0.23868.1> (10.72.0.53:57122 -> 10.72.0.56:5672):
connection_closed_abruptly

=INFO REPORT==== 11-Oct-2012::17:56:16 ===
accepting AMQP connection <0.23878.1> (10.72.0.52:56025 -> 10.72.0.56:5672)

=ERROR REPORT==== 11-Oct-2012::17:56:16 ===
connection <0.23878.1>, channel 1 - error:
{amqp_error,not_found,"no queue 'charon' in vhost '/'",'queue.declare'}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121012/891ba884/attachment.htm>


More information about the rabbitmq-discuss mailing list