We have a 3 node cluster running 2.8.4 supporting a small number of HA queues. During a controlled shutdown, we bring down each broker (play, play2, util) in sequence using something along the lines of<br><br>rabbitmqctl stop <pidfile><br>
<br>The first two brokers shut down without incident. The last shutdown gets stuck and never finishes.<br><br>I see the beam.smp processes running for both the broker and rabbitmqctl instance.<br><br>The log file has the following:<br>
<br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>rabbit on node rabbit@play2 down<br><br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>Mirrored-queue (queue 'unit_test' in vhost '/'): Master <rabbit@play.2.271.0> saw deaths of mirrors <rabbit@play2.3.296.0> <br>
<br>=ERROR REPORT==== 16-Jul-2012::13:06:19 ===<br>** Generic server rabbit_node_monitor terminating <br>** Last message in was {'DOWN',#Ref<0.0.0.1578>,process,<br> {rabbit,rabbit@play2},<br>
normal}<br>** When Server state == [rabbit@play2]<br>** Reason for termination == <br>** {bad_return_value,<br> {error,<br> {badarg,<br> [{erlang,is_process_alive,[<3173.371.0>]},<br>
{rabbit_amqqueue,'-on_node_down/1-fun-1-',8},<br> {qlc,collect,1},<br> {qlc,eval,2},<br> {rabbit_amqqueue,'-on_node_down/1-fun-16-',1},<br> {mnesia_tm,apply_fun,3},<br>
{mnesia_tm,execute_transaction,5},<br> {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1}]}}}<br><br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>Mirrored-queue (queue 'fake_munger_queue' in vhost '/'): Master <rabbit@play.2.270.0> saw deaths of mirrors <rabbit@play2.3.287.0> <br>
<br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>Mirrored-queue (queue 'random' in vhost '/'): Master <rabbit@play.2.269.0> saw deaths of mirrors <rabbit@play2.3.278.0> <br><br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>
Mirrored-queue (queue 'charon' in vhost '/'): Master <rabbit@play.2.276.0> saw deaths of mirrors <rabbit@play2.3.291.0> <br><br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>Mirrored-queue (queue 'ConfigurationManager' in vhost '/'): Master <rabbit@play.2.275.0> saw deaths of mirrors <rabbit@play2.3.289.0> <br>
<br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>Mirrored-queue (queue 'fake_service_2' in vhost '/'): Master <rabbit@play.2.272.0> saw deaths of mirrors <rabbit@play2.3.275.0> <br><br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>
Mirrored-queue (queue 'system_test' in vhost '/'): Master <rabbit@play.2.277.0> saw deaths of mirrors <rabbit@play2.3.293.0> <br><br>=INFO REPORT==== 16-Jul-2012::13:06:19 ===<br>Mirrored-queue (queue 'fake_configmgr' in vhost '/'): Master <rabbit@play.2.273.0> saw deaths of mirrors <rabbit@play2.3.281.0> <br>
<br>=INFO REPORT==== 16-Jul-2012::13:06:32 ===<br> application: rabbitmq_tracing<br> exited: stopped<br> type: permanent<br><br>=INFO REPORT==== 16-Jul-2012::13:06:32 ===<br>Stopping Rabbit<br><br>--------<br><br>
Since I've turned on Mnesia logging, the rabbitmq-server console spew is:<br><br>Mnesia(rabbit@play): Logging mnesia_down rabbit@play2<br>Mnesia(rabbit@play): Got mnesia_down from rabbit@play2, reconfiguring...<br>Mnesia(rabbit@play): Transaction {tid,1786,<0.176.0>} calling #Fun<rabbit_mirror_queue_misc.0.102623438> with [] failed: <br>
{bad_commit,rabbit@play2}<br>Mnesia(rabbit@play): Restarting transaction {tid,1786,<0.176.0>}: in 2ms {bad_commit,rabbit@play2}<br>Mnesia(rabbit@play): Transaction {tid,1785,<0.175.0>} calling #Fun<rabbit_mirror_queue_misc.0.102623438> with [] failed: <br>
{bad_commit,rabbit@play2}<br>Mnesia(rabbit@play): Restarting transaction {tid,1785,<0.175.0>}: in 2ms {bad_commit,rabbit@play2}<br>Mnesia(rabbit@play): write performed by {tid,1787,<0.176.0>} on record:<br> {rabbit_queue,{resource,<<"/">>,queue,<<"fake_service_2">>},<br>
true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br> <0.272.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1787,<0.176.0>} on record:<br>
{rabbit_durable_queue,{resource,<<"/">>,queue,<<"fake_service_2">>},<br> true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br>
<0.272.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1788,<0.176.0>} on record:<br> {rabbit_queue,{resource,<<"/">>,queue,<<"fake_queue">>},<br>
true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br> <0.270.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1788,<0.176.0>} on record:<br>
{rabbit_durable_queue,{resource,<<"/">>,queue,<<"fake_queue">>},<br> true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br>
<0.270.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1789,<0.175.0>} on record:<br> {rabbit_queue,{resource,<<"/">>,queue,<<"CM">>},<br>
true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br> <0.275.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1789,<0.175.0>} on record:<br>
{rabbit_durable_queue,<br> {resource,<<"/">>,queue,<<"CM">>},<br> true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br>
<0.275.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1790,<0.175.0>} on record:<br> {rabbit_queue,{resource,<<"/">>,queue,<<"cha">>},<br>
true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br> <0.276.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1790,<0.175.0>} on record:<br>
{rabbit_durable_queue,{resource,<<"/">>,queue,<<"cha">>},<br> true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br>
<0.276.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1791,<0.175.0>} on record:<br> {rabbit_queue,{resource,<<"/">>,queue,<<"random">>},<br>
true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br> <0.269.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1791,<0.175.0>} on record:<br>
{rabbit_durable_queue,{resource,<<"/">>,queue,<<"random">>},<br> true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br>
<0.269.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1793,<0.175.0>} on record:<br> {rabbit_queue,{resource,<<"/">>,queue,<<"system_test">>},<br>
true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br> <0.277.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1793,<0.175.0>} on record:<br>
{rabbit_durable_queue,{resource,<<"/">>,queue,<<"system_test">>},<br> true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br>
<0.277.0>,[],all}<br>Mnesia(rabbit@play): Transaction {tid,1794,<0.175.0>} calling #Fun<rabbit_amqqueue.22.63484291> with [] failed: <br> {badarg,[{erlang,is_process_alive,[<3173.371.0>]},<br>
{rabbit_amqqueue,'-on_node_down/1-fun-1-',8},<br> {qlc,collect,1},<br> {qlc,eval,2},<br> {rabbit_amqqueue,'-on_node_down/1-fun-16-',1},<br> {mnesia_tm,apply_fun,3},<br>
{mnesia_tm,execute_transaction,5},<br> {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1}]}<br>Mnesia(rabbit@play): write performed by {tid,1792,<0.176.0>} on record:<br> {rabbit_queue,{resource,<<"/">>,queue,<<"fake_CM">>},<br>
true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br> <0.273.0>,[],all}<br>Mnesia(rabbit@play): write performed by {tid,1792,<0.176.0>} on record:<br>
{rabbit_durable_queue,{resource,<<"/">>,queue,<<"fake_CM">>},<br> true,false,none,<br> [{<<"x-ha-policy">>,longstr,<<"all">>}],<br>
<0.273.0>,[],all}<br>Mnesia(rabbit@play): Transaction log dump initiated by time_threshold: {needs_dump,45}<br>Mnesia(rabbit@play): Transaction log dump initiated by time_threshold: already_dumped<br>
<br>