[rabbitmq-discuss] Hang while waiting for shutdown of last node in an HA cluster

Matt Pietrek mpietrek at skytap.com
Mon Jul 16 21:23:31 BST 2012


We have a 3 node cluster running 2.8.4 supporting a small number of HA
queues. During a controlled shutdown, we bring down each broker (play,
play2, util) in sequence using something along the lines of

rabbitmqctl stop <pidfile>

The first two brokers shut down without incident. The last shutdown gets
stuck and never finishes.

I see the beam.smp processes running for both the broker and rabbitmqctl
instance.

The log file has the following:

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
rabbit on node rabbit at play2 down

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'unit_test' in vhost '/'): Master <rabbit at play.2.271.0>
saw deaths of mirrors <rabbit at play2.3.296.0>

=ERROR REPORT==== 16-Jul-2012::13:06:19 ===
** Generic server rabbit_node_monitor terminating
** Last message in was {'DOWN',#Ref<0.0.0.1578>,process,
                               {rabbit,rabbit at play2},
                               normal}
** When Server state == [rabbit at play2]
** Reason for termination ==
** {bad_return_value,
       {error,
           {badarg,
               [{erlang,is_process_alive,[<3173.371.0>]},
                {rabbit_amqqueue,'-on_node_down/1-fun-1-',8},
                {qlc,collect,1},
                {qlc,eval,2},
                {rabbit_amqqueue,'-on_node_down/1-fun-16-',1},
                {mnesia_tm,apply_fun,3},
                {mnesia_tm,execute_transaction,5},
                {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1}]}}}

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'fake_munger_queue' in vhost '/'): Master
<rabbit at play.2.270.0> saw deaths of mirrors <rabbit at play2.3.287.0>

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'random' in vhost '/'): Master <rabbit at play.2.269.0>
saw deaths of mirrors <rabbit at play2.3.278.0>

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'charon' in vhost '/'): Master <rabbit at play.2.276.0>
saw deaths of mirrors <rabbit at play2.3.291.0>

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'ConfigurationManager' in vhost '/'): Master
<rabbit at play.2.275.0> saw deaths of mirrors <rabbit at play2.3.289.0>

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'fake_service_2' in vhost '/'): Master
<rabbit at play.2.272.0> saw deaths of mirrors <rabbit at play2.3.275.0>

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'system_test' in vhost '/'): Master
<rabbit at play.2.277.0> saw deaths of mirrors <rabbit at play2.3.293.0>

=INFO REPORT==== 16-Jul-2012::13:06:19 ===
Mirrored-queue (queue 'fake_configmgr' in vhost '/'): Master
<rabbit at play.2.273.0> saw deaths of mirrors <rabbit at play2.3.281.0>

=INFO REPORT==== 16-Jul-2012::13:06:32 ===
    application: rabbitmq_tracing
    exited: stopped
    type: permanent

=INFO REPORT==== 16-Jul-2012::13:06:32 ===
Stopping Rabbit

--------

Since I've turned on Mnesia logging, the rabbitmq-server console spew is:

Mnesia(rabbit at play): Logging mnesia_down rabbit at play2
Mnesia(rabbit at play): Got mnesia_down from rabbit at play2, reconfiguring...
Mnesia(rabbit at play): Transaction {tid,1786,<0.176.0>} calling
#Fun<rabbit_mirror_queue_misc.0.102623438> with [] failed:
 {bad_commit,rabbit at play2}
Mnesia(rabbit at play): Restarting transaction {tid,1786,<0.176.0>}: in 2ms
{bad_commit,rabbit at play2}
Mnesia(rabbit at play): Transaction {tid,1785,<0.175.0>} calling
#Fun<rabbit_mirror_queue_misc.0.102623438> with [] failed:
 {bad_commit,rabbit at play2}
Mnesia(rabbit at play): Restarting transaction {tid,1785,<0.175.0>}: in 2ms
{bad_commit,rabbit at play2}
Mnesia(rabbit at play): write performed by {tid,1787,<0.176.0>} on record:
        {rabbit_queue,{resource,<<"/">>,queue,<<"fake_service_2">>},
                      true,false,none,
                      [{<<"x-ha-policy">>,longstr,<<"all">>}],
                      <0.272.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1787,<0.176.0>} on record:
        {rabbit_durable_queue,{resource,<<"/">>,queue,<<"fake_service_2">>},
                              true,false,none,
                              [{<<"x-ha-policy">>,longstr,<<"all">>}],
                              <0.272.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1788,<0.176.0>} on record:
        {rabbit_queue,{resource,<<"/">>,queue,<<"fake_queue">>},
                      true,false,none,
                      [{<<"x-ha-policy">>,longstr,<<"all">>}],
                      <0.270.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1788,<0.176.0>} on record:
        {rabbit_durable_queue,{resource,<<"/">>,queue,<<"fake_queue">>},
                              true,false,none,
                              [{<<"x-ha-policy">>,longstr,<<"all">>}],
                              <0.270.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1789,<0.175.0>} on record:
        {rabbit_queue,{resource,<<"/">>,queue,<<"CM">>},
                      true,false,none,
                      [{<<"x-ha-policy">>,longstr,<<"all">>}],
                      <0.275.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1789,<0.175.0>} on record:
        {rabbit_durable_queue,
            {resource,<<"/">>,queue,<<"CM">>},
            true,false,none,
            [{<<"x-ha-policy">>,longstr,<<"all">>}],
            <0.275.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1790,<0.175.0>} on record:
        {rabbit_queue,{resource,<<"/">>,queue,<<"cha">>},
                      true,false,none,
                      [{<<"x-ha-policy">>,longstr,<<"all">>}],
                      <0.276.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1790,<0.175.0>} on record:
        {rabbit_durable_queue,{resource,<<"/">>,queue,<<"cha">>},
                              true,false,none,
                              [{<<"x-ha-policy">>,longstr,<<"all">>}],
                              <0.276.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1791,<0.175.0>} on record:
        {rabbit_queue,{resource,<<"/">>,queue,<<"random">>},
                      true,false,none,
                      [{<<"x-ha-policy">>,longstr,<<"all">>}],
                      <0.269.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1791,<0.175.0>} on record:
        {rabbit_durable_queue,{resource,<<"/">>,queue,<<"random">>},
                              true,false,none,
                              [{<<"x-ha-policy">>,longstr,<<"all">>}],
                              <0.269.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1793,<0.175.0>} on record:
        {rabbit_queue,{resource,<<"/">>,queue,<<"system_test">>},
                      true,false,none,
                      [{<<"x-ha-policy">>,longstr,<<"all">>}],
                      <0.277.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1793,<0.175.0>} on record:
        {rabbit_durable_queue,{resource,<<"/">>,queue,<<"system_test">>},
                              true,false,none,
                              [{<<"x-ha-policy">>,longstr,<<"all">>}],
                              <0.277.0>,[],all}
Mnesia(rabbit at play): Transaction {tid,1794,<0.175.0>} calling
#Fun<rabbit_amqqueue.22.63484291> with [] failed:
 {badarg,[{erlang,is_process_alive,[<3173.371.0>]},
          {rabbit_amqqueue,'-on_node_down/1-fun-1-',8},
          {qlc,collect,1},
          {qlc,eval,2},
          {rabbit_amqqueue,'-on_node_down/1-fun-16-',1},
          {mnesia_tm,apply_fun,3},
          {mnesia_tm,execute_transaction,5},
          {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1}]}
Mnesia(rabbit at play): write performed by {tid,1792,<0.176.0>} on record:
        {rabbit_queue,{resource,<<"/">>,queue,<<"fake_CM">>},
                      true,false,none,
                      [{<<"x-ha-policy">>,longstr,<<"all">>}],
                      <0.273.0>,[],all}
Mnesia(rabbit at play): write performed by {tid,1792,<0.176.0>} on record:
        {rabbit_durable_queue,{resource,<<"/">>,queue,<<"fake_CM">>},
                              true,false,none,
                              [{<<"x-ha-policy">>,longstr,<<"all">>}],
                              <0.273.0>,[],all}
Mnesia(rabbit at play): Transaction log dump initiated by time_threshold:
{needs_dump,45}
Mnesia(rabbit at play): Transaction log dump initiated by time_threshold:
already_dumped
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120716/7aed47de/attachment.htm>


More information about the rabbitmq-discuss mailing list