Last night we had to reboot a RabbitMQ node in a 3 node cluster within EC2. The node failed to restart with the dreaded timeout_waiting_for_tables error. <div><br></div><div>Looking as past discussion on that topic it is clear that the most common reason for it is a node name change, either because the node name contains the IP address, the hostname changed, or a new node is being provisioned on an image with an old mnesia DB with some other nodename.</div>
<div><br></div><div>None of those appears to apply in our current situation. The node name does not include the IP address and the node name did not change, as can be seen in the start up logs. Just to be sure we set the node name in the /etc/rabbitmq/rabbitmq-env.conf file and attempted to restart, again without success.</div>
<div><br></div><div>I enabled mnesia debugging at the trace level and it does not provide any useful information as to what is causing the timeout. The cluster has developed a backlog of persistent messages in two of the queues (about 70K in total), but from looking at what tables the system complains about it does not appear those are the tables its trying to sync. All the other metadata (users, exchanges, bindings, queues) is of very small size, so 30 seconds should be sufficient time.</div>
<div><br></div><div>While we could wipe the mnesia state from the node, we'd like to find out why this happens and whether it can be repaired, for future reference.</div><div><br></div><div>The start-up log is attached below. </div>
<div><br></div><div>Any ideas?</div><div><br></div><div>-----------------</div><div><div>Activating RabbitMQ plugins ...</div><div>7 plugins activated:</div><div>* amqp_client-2.5.1</div><div>* mochiweb-1.3-rmq2.5.1-git9a53dbd</div>
<div>* rabbitmq_management-2.5.1</div><div>* rabbitmq_management_agent-2.5.1</div><div>* rabbitmq_mochiweb-2.5.1</div><div>* rabbitmq_stomp-2.5.1</div><div>* webmachine-1.7.0-rmq2.5.1-hg0c4b60a</div><div><br></div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_monitor starting: <0.68.0></div>
<div>Mnesia('rabbit@queue-beta1-int'): Version: "4.4.17"</div><div>Mnesia('rabbit@queue-beta1-int'): Env access_module: mnesia</div><div>Mnesia('rabbit@queue-beta1-int'): Env auto_repair: true</div>
<div>Mnesia('rabbit@queue-beta1-int'): Env backup_module: mnesia_backup</div><div>Mnesia('rabbit@queue-beta1-int'): Env debug: trace</div><div>Mnesia('rabbit@queue-beta1-int'): Env dir: "/var/lib/rabbitmq/mnesia/rabbit@queue-beta1-int"</div>
<div>Mnesia('rabbit@queue-beta1-int'): Env dump_log_load_regulation: false</div><div>Mnesia('rabbit@queue-beta1-int'): Env dump_log_time_threshold: 180000</div><div>Mnesia('rabbit@queue-beta1-int'): Env dump_log_update_in_place: true</div>
<div>Mnesia('rabbit@queue-beta1-int'): Env dump_log_write_threshold: 1000</div><div>Mnesia('rabbit@queue-beta1-int'): Env embedded_mnemosyne: false</div><div>Mnesia('rabbit@queue-beta1-int'): Env event_module: mnesia_event</div>
<div>Mnesia('rabbit@queue-beta1-int'): Env extra_db_nodes: []</div><div>Mnesia('rabbit@queue-beta1-int'): Env ignore_fallback_at_startup: false</div><div>Mnesia('rabbit@queue-beta1-int'): Env fallback_error_function: {mnesia,lkill}</div>
<div>Mnesia('rabbit@queue-beta1-int'): Env max_wait_for_decision: infinity</div><div>Mnesia('rabbit@queue-beta1-int'): Env schema_location: opt_disc</div><div>Mnesia('rabbit@queue-beta1-int'): Env core_dir: false</div>
<div>Mnesia('rabbit@queue-beta1-int'): Env pid_sort_order: false</div><div>Mnesia('rabbit@queue-beta1-int'): Env no_table_loaders: 2</div><div>Mnesia('rabbit@queue-beta1-int'): Env dc_dump_limit: 4</div>
<div>Mnesia('rabbit@queue-beta1-int'): Env send_compressed: 0</div><div>Mnesia('rabbit@queue-beta1-int'): Mnesia debug level set to trace</div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_subscr starting: <0.69.0></div>
<div>Mnesia('rabbit@queue-beta1-int'): mnesia_locker starting: <0.70.0></div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_recover starting: <0.71.0></div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_tm starting: <0.72.0></div>
<div>Mnesia('rabbit@queue-beta1-int'): Schema initiated from: disc</div><div>Mnesia('rabbit@queue-beta1-int'): Transaction log dump initiated by scan_decisions</div><div>Mnesia('rabbit@queue-beta1-int'): Transaction log dump initiated by startup: {needs_dump,0}</div>
<div>Mnesia('rabbit@queue-beta1-int'): Transaction log dump initiated by startup: already_dumped</div><div>Mnesia('rabbit@queue-beta1-int'): Initial dump of log during startup: [dumped,</div><div> dumped]</div>
<div>Mnesia('rabbit@queue-beta1-int'): mnesia_controller starting: <0.98.0></div></div><div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_downs = []</div><div>Mnesia('rabbit@queue-beta1-int'): Intend to load tables: []</div>
<div><br></div><div>+---+ +---+</div><div>| | | |</div><div>| | | |</div><div>| | | |</div><div>| +---+ +-------+</div><div>| |</div><div>| RabbitMQ +---+ |</div><div>| | | |</div>
<div>| v2.5.1 +---+ |</div><div>| |</div><div>+-------------------+</div><div>AMQP 0-9-1 / 0-9 / 0-8</div><div>Copyright (C) 2007-2011 VMware, Inc.</div><div>Licensed under the MPL. See <a href="http://www.rabbitmq.com/">http://www.rabbitmq.com/</a></div>
<div><br></div><div>node : rabbit@queue-beta1-int</div><div>app descriptor : /usr/lib/rabbitmq/lib/rabbitmq_server-2.5.1/sbin/../ebin/rabbit.app</div><div>home dir : /var/lib/rabbitmq</div><div>config file(s) : /etc/rabbitmq/rabbitmq.config</div>
<div>cookie hash : a+Jg2nl357GwYTLG/0y3Lg==</div><div>log : /var/log/rabbitmq/rabbit@queue-beta1-int.log</div><div>sasl log : /var/log/rabbitmq/rabbit@queue-beta1-int-sasl.log</div><div>database dir : /var/lib/rabbitmq/mnesia/rabbit@queue-beta1-int</div>
<div>erlang version : 5.8.3</div><div><br></div><div>-- rabbit boot start</div><div>starting file handle cache server ...done</div><div>starting worker pool ...done</div>
<div>starting database ...BOOT ERROR: FAILED</div><div>Reason: {error,</div><div> {timeout_waiting_for_tables,</div><div> [rabbit_user,rabbit_user_permission,rabbit_vhost,</div>
<div> rabbit_listener,rabbit_durable_route,</div><div> rabbit_semi_durable_route,rabbit_route,rabbit_reverse_route,</div><div> rabbit_topic_trie_edge,rabbit_topic_trie_binding,</div>
<div> rabbit_durable_exchange,rabbit_exchange,</div><div> rabbit_exchange_serial,rabbit_durable_queue,rabbit_queue]}}</div><div>Stacktrace: [{rabbit_mnesia,wait_for_tables,1},</div><div> {rabbit_mnesia,check_schema_integrity,0},</div>
<div> {rabbit_mnesia,ensure_schema_integrity,0},</div><div> {rabbit_mnesia,init_db,3},</div></div><div><div> {rabbit_mnesia,init,0},</div><div> {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},</div>
<div> {rabbit,run_boot_step,1},</div><div> {rabbit,'-start/2-lc$^0/1-0-',1}]</div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_controller terminated: shutdown</div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_tm terminated: shutdown</div>
<div>Mnesia('rabbit@queue-beta1-int'): mnesia_recover terminated: shutdown</div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_locker terminated: shutdown</div><div>Mnesia('rabbit@queue-beta1-int'): mnesia_subscr terminated: shutdown</div>
<div>Mnesia('rabbit@queue-beta1-int'): mnesia_monitor terminated: shutdown</div><div>{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}</div>
</div><div><br></div>