We have a <span class="il">3</span> node cluster (mq1, mq2, mq3) running <span class="il">2.8</span>.4
supporting a small number of HA queues. During startup of the cluster, we start all nodes in parallel. Usually everything works fine. However, we've just recently seen one of the nodes (mq3) won't start, i.e., the rabbitmqctl wait <pid> doesn't complete.<br>
<br>I can log in to the management UI on mq1 and mq2, so they're at least minimally running.<br><br>Luckily, we've turned on verbose Mnesia logging. here's what the failing node (mq3) shows in the console spew:<br>
<br>Activating RabbitMQ plugins ...<br>6 plugins activated:<br>* amqp_client-0.0.0<br>* mochiweb-1.3-rmq0.0.0-git<br>* rabbitmq_management-0.0.0<br>* rabbitmq_management_agent-0.0.0<br>* rabbitmq_mochiweb-0.0.0<br>* webmachine-1.7.0-rmq0.0.0-hg<br>
Mnesia(rabbit@mq3): mnesia_monitor starting: <0.54.0><br>Mnesia(rabbit@mq3): Version: "4.4.12"<br>Mnesia(rabbit@mq3): Env access_module: mnesia<br>Mnesia(rabbit@mq3): Env dir: "/highland/var/lib/rabbit@mq3"<br>
Mnesia(rabbit@mq3): Env dump_log_load_regulation: false<br>Mnesia(rabbit@mq3): Env dump_log_time_threshold: 180000<br>Mnesia(rabbit@mq3): Env dump_log_update_in_place: true<br>Mnesia(rabbit@mq3): Env dump_log_write_threshold: 1000<br>
Mnesia(rabbit@mq3): Env event_module: mnesia_event<br>Mnesia(rabbit@mq3): Env core_dir: false<br>Mnesia(rabbit@mq3): Env no_table_loaders: 2<br>Mnesia(rabbit@mq3): Env dc_dump_limit: 4<br>Mnesia(rabbit@mq3): Mnesia debug level set to trace<br>
Mnesia(rabbit@mq3): mnesia_subscr starting: <0.55.0><br>Mnesia(rabbit@mq3): mnesia_locker starting: <0.56.0><br>Mnesia(rabbit@mq3): mnesia_late_loader starting: <0.86.0><br>Mnesia(rabbit@mq3): Cannot get cstructs, Node rabbit@mq2 {node_not_running,<br>
Mnesia(rabbit@mq3): Transaction log dump skipped (optional): schema_prepare<br>Mnesia(rabbit@mq3): Transaction log dump skipped (optional): schema_prepare<br>Mnesia(rabbit@mq3): mnesia_downs = []<br> {rabbit_exchange,ram_only},<br>
{rabbit_semi_durable_route,<br> ram_only},<br> {rabbit_listener,ram_only},<br> {gm_group,ram_only}]<br>
Mnesia(rabbit@mq3): Table rabbit_route is loaded on rabbit@mq1. s=ram_copies, r=nowhere, lc=false, f=false, m=true<br>Mnesia(rabbit@mq3): Getting table rabbit_user_permission (disc_copies) from node rabbit@mq1: {active_remote,<br>
rabbit@mq1}<br>Mnesia(rabbit@mq3): Table rabbit_semi_durable_route is loaded on rabbit@mq1. s=ram_copies, r=nowhere, lc=false, f=false, m=true<br>
Mnesia(rabbit@mq3): Table rabbit_queue is loaded on rabbit@mq2. s=ram_copies, r=nowhere, lc=false, f=false, m=true<br>Mnesia(rabbit@mq3): Table rabbit_route is loaded on rabbit@mq2. s=ram_copies, r=nowhere, lc=false, f=false, m=true<br>
last message repeated 2 times<br>| +---+ +-------+<br>| |<br>starting file handle cache server ...done<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3657,<0.181.0>}: in 128ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3657,<0.181.0>}: in 236ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Sync serial {tid,3657,<0.181.0>}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3657,<0.181.0>}: in 488ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3657,<0.181.0>}: in 519ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
{aborted,nomore}<br>Mnesia(rabbit@mq3): Getting table rabbit_durable_exchange (disc_copies) from node rabbit@mq1: {active_remote,<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 115ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 111ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 168ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Sync serial {tid,3732,<0.181.0>}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 361ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 481ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 552ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 538ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 226ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 327ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 313ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 326ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3732,<0.181.0>}: in 763ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Transaction {tid,3732,<0.181.0>} calling #Fun<mnesia_loader.0.79080158> with [] failed: <br>{aborted,nomore}<br>Mnesia(rabbit@mq3): Getting table rabbit_durable_exchange (disc_copies) from node rabbit@mq2: {active_remote,<br>
rabbit@mq1}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 8ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 34ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 54ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 80ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 180ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 85ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 201ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 167ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 385ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 315ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Sync serial {tid,3733,<0.181.0>}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 237ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 165ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 197ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 496ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 348ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 325ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 412ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 585ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3733,<0.181.0>}: in 365ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Transaction {tid,3733,<0.181.0>} calling #Fun<mnesia_loader.0.79080158> with [] failed: <br>
{aborted,nomore}<br>Mnesia(rabbit@mq3): Getting table rabbit_durable_exchange (disc_copies) from node rabbit@mq1: {active_remote,<br> rabbit@mq1}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 7ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 21ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 34ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 74ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 89ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 125ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 241ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 249ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 195ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 317ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Sync serial {tid,3734,<0.181.0>}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 421ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 210ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 447ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 213ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 425ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 261ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 440ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 620ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>Mnesia(rabbit@mq3): Restarting transaction {tid,3734,<0.181.0>}: in 367ms {cyclic,rabbit@mq3,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}<br>
Mnesia(rabbit@mq3): Transaction {tid,3734,<0.181.0>} calling #Fun<mnesia_loader.0.79080158> with [] failed: <br>{aborted,nomore}<br><br>The pattern of "Getting table rabbit_durable_exchange (disc_copies) from node rabbit@mq1:" cycles between mq1 and mq2 repeatedly until I kill mq3.<br>
<br>What other sort of information can I provide or look for when this situation repeats?<br><br>Thanks,<br><br>Matt<br>