[rabbitmq-discuss] One node in a cluster never fully starts up

Matt Pietrek mpietrek at skytap.com
Wed Jul 25 19:48:56 BST 2012


We have a 3 node cluster (mq1, mq2, mq3) running 2.8.4 supporting a small
number of HA queues. During startup of the cluster, we start all nodes in
parallel. Usually everything works fine. However, we've just recently seen
one of the nodes (mq3) won't start, i.e., the rabbitmqctl wait <pid>
doesn't complete.

I can log in to the management UI on mq1 and mq2, so they're at least
minimally running.

Luckily, we've turned on verbose Mnesia logging. here's what the failing
node (mq3) shows in the console spew:

Activating RabbitMQ plugins ...
6 plugins activated:
* amqp_client-0.0.0
* mochiweb-1.3-rmq0.0.0-git
* rabbitmq_management-0.0.0
* rabbitmq_management_agent-0.0.0
* rabbitmq_mochiweb-0.0.0
* webmachine-1.7.0-rmq0.0.0-hg
Mnesia(rabbit at mq3): mnesia_monitor starting: <0.54.0>
Mnesia(rabbit at mq3): Version: "4.4.12"
Mnesia(rabbit at mq3): Env access_module: mnesia
Mnesia(rabbit at mq3): Env dir: "/highland/var/lib/rabbit at mq3"
Mnesia(rabbit at mq3): Env dump_log_load_regulation: false
Mnesia(rabbit at mq3): Env dump_log_time_threshold: 180000
Mnesia(rabbit at mq3): Env dump_log_update_in_place: true
Mnesia(rabbit at mq3): Env dump_log_write_threshold: 1000
Mnesia(rabbit at mq3): Env event_module: mnesia_event
Mnesia(rabbit at mq3): Env core_dir: false
Mnesia(rabbit at mq3): Env no_table_loaders: 2
Mnesia(rabbit at mq3): Env dc_dump_limit: 4
Mnesia(rabbit at mq3): Mnesia debug level set to trace
Mnesia(rabbit at mq3): mnesia_subscr starting: <0.55.0>
Mnesia(rabbit at mq3): mnesia_locker starting: <0.56.0>
Mnesia(rabbit at mq3): mnesia_late_loader starting: <0.86.0>
Mnesia(rabbit at mq3): Cannot get cstructs, Node rabbit at mq2 {node_not_running,
Mnesia(rabbit at mq3): Transaction log dump skipped (optional): schema_prepare
Mnesia(rabbit at mq3): Transaction log dump skipped (optional): schema_prepare
Mnesia(rabbit at mq3): mnesia_downs = []
                                           {rabbit_exchange,ram_only},
                                           {rabbit_semi_durable_route,
                                            ram_only},
                                           {rabbit_listener,ram_only},
                                           {gm_group,ram_only}]
Mnesia(rabbit at mq3): Table rabbit_route is loaded on
rabbit at mq1.s=ram_copies, r=nowhere, lc=false, f=false, m=true
Mnesia(rabbit at mq3): Getting table rabbit_user_permission (disc_copies) from
node rabbit at mq1: {active_remote,

rabbit at mq1}
Mnesia(rabbit at mq3): Table rabbit_semi_durable_route is loaded on
rabbit at mq1.s=ram_copies, r=nowhere, lc=false, f=false, m=true
Mnesia(rabbit at mq3): Table rabbit_queue is loaded on
rabbit at mq2.s=ram_copies, r=nowhere, lc=false, f=false, m=true
Mnesia(rabbit at mq3): Table rabbit_route is loaded on
rabbit at mq2.s=ram_copies, r=nowhere, lc=false, f=false, m=true
last message repeated 2 times
|   +---+   +-------+
|                   |
starting file handle cache server
...done
Mnesia(rabbit at mq3): Restarting transaction {tid,3657,<0.181.0>}: in 128ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3657,<0.181.0>}: in 236ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Sync serial {tid,3657,<0.181.0>}
Mnesia(rabbit at mq3): Restarting transaction {tid,3657,<0.181.0>}: in 488ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3657,<0.181.0>}: in 519ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
{aborted,nomore}
Mnesia(rabbit at mq3): Getting table rabbit_durable_exchange (disc_copies)
from node rabbit at mq1: {active_remote,
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 115ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 111ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 168ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Sync serial {tid,3732,<0.181.0>}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 361ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 481ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 552ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 538ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 226ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 327ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 313ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 326ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3732,<0.181.0>}: in 763ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Transaction {tid,3732,<0.181.0>} calling
#Fun<mnesia_loader.0.79080158> with [] failed:
{aborted,nomore}
Mnesia(rabbit at mq3): Getting table rabbit_durable_exchange (disc_copies)
from node rabbit at mq2: {active_remote,

rabbit at mq1}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 8ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 34ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 54ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 80ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 180ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 85ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 201ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 167ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 385ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 315ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Sync serial {tid,3733,<0.181.0>}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 237ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 165ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 197ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 496ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 348ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 325ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 412ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 585ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3733,<0.181.0>}: in 365ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Transaction {tid,3733,<0.181.0>} calling
#Fun<mnesia_loader.0.79080158> with [] failed:
{aborted,nomore}
Mnesia(rabbit at mq3): Getting table rabbit_durable_exchange (disc_copies)
from node rabbit at mq1: {active_remote,

rabbit at mq1}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 7ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 21ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 34ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 74ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 89ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 125ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 241ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 249ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 195ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 317ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Sync serial {tid,3734,<0.181.0>}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 421ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 210ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 447ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 213ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 425ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 261ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 440ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 620ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Restarting transaction {tid,3734,<0.181.0>}: in 367ms
{cyclic,rabbit at mq3
,{schema,rabbit_durable_exchange},read,read,{tid,3654,<0.175.0>}}
Mnesia(rabbit at mq3): Transaction {tid,3734,<0.181.0>} calling
#Fun<mnesia_loader.0.79080158> with [] failed:
{aborted,nomore}

The pattern of "Getting table rabbit_durable_exchange (disc_copies) from
node rabbit at mq1:" cycles between mq1 and mq2 repeatedly until I kill mq3.

What other sort of information can I provide or look for when this
situation repeats?

Thanks,

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120725/7dfc9e3e/attachment.htm>


More information about the rabbitmq-discuss mailing list