[rabbitmq-discuss] Issue with rabbit starting up every time

Chris Madden chris.madden at gmail.com
Thu Jun 30 20:47:21 BST 2011

I have 2 nodes in a cluster, both are disc nodes. Occasionally, following a reboot, rabbit will not start.  The error I get is the dreaded timeout_waiting_for_tables. The googles indicate that this can happen when host names change, but that isn't happening. Further, when I get this problem, one node is always up and stable, and network connectivity appears fine.

Interestingly, it seems to correct itself if I continue to restart rabbit. Sometimes it can take 15-20 attempts to get it to start correctly. It *feels* like a race internal to rabbit to me, as I see it more on single processor systems than I do on multi-processor systems. I don't have any durable or persistent queues for it to have to load or anything, just 1 user in the database and a couple permissions.

I'm suspicious of http://hg.rabbitmq.com/rabbitmq-server/file/5f84b55205fd/src/rabbit_mnesia.erl#l610, with a hard coded timeout a heavily loaded system (which this is definitely at boot time) may take more than 30 seconds (assuming I'm reading that correctly).

I've rabbit 2.5.0 on linux on erlang R14B03

Looking at the log files I see:

+---+ +---+
| | | |
| | | |
| | | |
| +---+ +-------+
| |
| RabbitMQ +---+ |
| | | |
| v2.5.0 +---+ |
| |
AMQP 0-9-1 / 0-9 / 0-8
Copyright (C) 2007-2011 VMware, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/

node : rabbit at halfoat (mailto:rabbit at halfoat)
app descriptor : /usr/lib/rabbitmq/lib/rabbitmq_server-2.5.0/sbin/../ebin/rabbit.app
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash : nwwpVc1h/mTYt75nAmVI0A==
log : /var/log/rabbitmq/rabbit at halfoat.log (mailto:/var/log/rabbitmq/rabbit at halfoat.log)
sasl log : /var/log/rabbitmq/rabbit at halfoat-sasl.log (mailto:/var/log/rabbitmq/rabbit at halfoat-sasl.log)
database dir : /var/db/rabbitmq/rabbit at halfoat (mailto:/var/db/rabbitmq/rabbit at halfoat)
erlang version : 5.8.4

-- rabbit boot start
starting file handle cache server ...
=INFO REPORT==== 30-Jun-2011::19:09:41 ===
Limiting to approx 924 file handles (829 sockets)
starting worker pool ...done
starting database ...

At this point it pauses, then times out, dumping:

Reason: {error,
Stacktrace: [{rabbit_mnesia,wait_for_tables,1},
Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}

More information about the rabbitmq-discuss mailing list